Big Data in Healthcare: How an Industry Became a Pokemon Trainer

Published September 19, 2017 by 5 min read

Big Data in healthcare currently offers insights into health trends to help providers better assess patients. Anyone who’s ever played Pokemon knows that trainers use and collect various pocket monsters to battle one another in gyms. As the song goes, you “gotta catch ‘em all!” By using Big Data in healthcare, providers compile vast amounts of information to watch for patterns that can better predict diagnosis and treatment outcomes.

Where does Big Data in healthcare come from?

Record keeping, regulatory requirements, and patient care generate a lot of data within the healthcare industry. As more providers move towards electronic records systems, the repositories of data only become more vast.


This rise of technology allows for the aggregation of providers’ clinical notes. Before, these paper records would live in locked drawers, but now they are part of larger organizational databases. This aggregation includes any test results or information received from medical devices, such as heart rate monitors, that are connected to the Internet of Things (IoT). Finally, these results can be integrated with patient data that’s stored in electronic patient records.


How is Big Data in healthcare like Pokemon?

Other than to catch the little monsters, the goal of Pokemon is to help the Professor gather information about the monsters for research. In this way, the collected healthcare information is functionally a real life Pokedex, the index of all the Pokemon someone has seen or collected.


The Trainer’s Pokedex shows all of the pocket monsters’ information to help the trainer understand strengths, weaknesses, and abilities. The more Pokemon someone has, the better their chance of finding one strong enough to defeat gym bosses. The large data repositories in the healthcare industry provide the same type of information about treatments or drugs.


The more information in the Big Data set, the more opportunities the healthcare industry has to defeat diseases.

How does Big Data in healthcare help providers?

When making healthcare decisions, patients need to understand all the possible risks. A data set consisting of five people, one of whom died, has a 20% fatality rate. A data set consisting of 100 people, five of whom died, has a 5% fatality rate; 1 million people with 100,000 deaths indicates a 10% fatality rate. With more data points, predictive accuracy increases. When the question is one of life or death, Big Data can make more reliable decisions. Big Data is defined by the three “V’s”: variety, volume, and velocity.



Anyone who’s ever played Pokemon or lived with a kid who collects Pokemon understands that there are a variety of “types” such as rock, electric, water, fire, and psychic. Sometimes, a trainer will have a dual type Pokemon with multiple powers, such as an “electric rock dual type.”

Pokemon Trainers seek to catch all the Pokemon to create a varied and strong Pokedex.


Big Data repositories do the same thing. By combining and analyzing different types of data, the healthcare industry can better match treatments and outcomes or predict patient risk. Big Data incorporates not only financial data and clinical data, but also genomic data. This kind of information aggregation can help providers determine whether a condition is related to socioeconomic status or based on family history of the disease. Instead of looking at just one factor, the medical community can now account for the combined effects of multiple types of factors to make better predictions.



Look around the house of anyone with Pokemon cards, and you can see them strewn everywhere or collected in binder upon binder upon binder. The sheer volume of Pokemon collections speaks to the soul of the Big Data analyst.


As the healthcare industry becomes more automated, so too does data collection. Hospitals are beginning to implement systems that allow the electronic movement of information between connected care providers. As more providers become connected, the volume of cumulative information increases. The never-ending cycle means that Big Data is becoming more sophisticated because of the increasing volume of information collected.



Smartwatches and connected IoT devices are helping healthcare feed the need for speed. Some doctors are using connected devices to help track patient fitness. For example, in February 2017, Nokia announced its plan to overhaul the Health Mate app and launch a platform where people could share their information with their doctors. Nokia also undertook a HIPAA-compliant information sharing application called the Patient Care Platform that allowed health care providers to access patients’ Health Mate apps and monitor some vitals.


As more patients use smartphones and smartwatches to track their health, they will be offering more data collection points. These collection points may or may not be useful, but the information needs to be protected regardless. This creates a new concern for companies that need to be HIPAA-compliant.



Yes, this started with the three V’s of Big Data, but according to an article on the National Center for Biotechnology Information website, practitioners and researchers add a fourth V. Big Data collection traditionally assumes that the information is easy to “read,” i.e.,  that the files are electronically manipulable. However, anyone who’s ever made a joke about doctor handwriting knows that this may not be the case in the healthcare community.


Therefore, to pool Big Data in healthcare, analysts need to look for different and better tools that accommodate the issues specific to the industry.


Where do Big Data and HIPAA intersect?


HIPAA requires that protected health information (PHI) be either encrypted or anonymized. When it comes to using Big Data in healthcare, this requirement is causing consternation.


Scrubbing information to be HIPAA compliant requires following the de-identification procedures set forth in Section 164.514(a). However, an inherent tension appears to exist in these procedures when aggregating information under the Big Data umbrella.


The HIPAA Privacy Rule lists the following as one of the identifiers that needs to be removed:


(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:

(1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and

(2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000


One of the biggest aggregation factors leading to predictive medical practices with Big Data comes from analyzing geographic location as a proxy for  socioeconomic status. This means that while you’ve gotta catch ‘em all, you can’t necessarily use them all in your analysis unless you’re HIPAA compliant.


How protecting Big Data in conjunction with HIPAA compliance can make you a Healthcare Pokemon Master Trainer

Even if your organization isn’t dispensing medical care, you may be a link in the HIPAA chain of trust as either a covered entity or a Business Associate of one. As a cloud service provider or data modeling center for a hospital, your customer may be using Big Data to help understand its population. This means that they need to be able to trust your server with the information in their database.


Pokemon and Big Data have one more commonality: trainers and healthcare professionals need to keep their assets safe. Just as a trainer doesn’t want the Pokemon to escape, the healthcare professional doesn’t want the information to be breached. To be a master healthcare service vendor, you need to show your customers that you’ve locked down that information.


How to use GRC automation to prove master status

Potential customers need to know that you’re not only keeping information safe, but also are responsive to their concerns. This means you can quickly respond to queries about your compliance.


GRC automation makes providing documentation easy. For example, ZenGRC’s “System of Records” function allows you to pull up an audit report quickly and prove that your controls are working. Instead of having to find someone to ask, you can quickly provide your customer with the assurance they need.


In addition, you can set different authorizations for viewing and editing—so all members of your organization can access the information they need. Your internal auditor no longer acts as the gatekeeper to information but also doesn’t have to fear that someone in a different department will contaminate the data.


As Big Data becomes more important to the healthcare industry, more data analytics providers will need to become HIPAA compliant. As your compliance program expands, you want solutions that make it painless. Automation is the key to streamlining your shifting needs.


For more information on using automation to ease your compliance pain, download our eBook Compliance Management Best Practices: When Will Excel Crush You?

Learn how we can fit into your business.

Schedule a demo to learn how we can help guide your organization to confidence in infosec risk and compliance.

Help us get to know you.

Get a demo