Negative Clue: How Big Data Can Detect Diseases Early

Medtelkasten Article #2

I want to open this up with a story. A mystery if you will. Can you figure out the answer? 

A detective walked into a hardware store on his day off and wanted some peace and quiet As he’s quietly browsing the racks, his phone rings. It’s his officer friend, Kirby. He says he’s got a suspect but he’s not sure if it’s really him. He wants a second opinion if he’s nearby. Curiosity piqued, the detective agrees to weigh in. 

Minutes later, Kirby arrives with the suspect in tow. The suspect had an unkempt beard, with deep wrinkles in his forehead. His appearance lent an uneasiness to his presence in the store. Kirby began to speak. 

“He says he couldn’t have been the perpetrator because he’s been working at the docks all morning. He’s a fisherman…We searched him and turned up nothing, so I’m a bit stumped really. He kept saying that he was just at work, and he was going home early because he didn’t feel too well. ” He paused in thought before continuing. “His features match the eyewitness description and I’m still a bit uncertain. Story checks out, he does work at the dock…” The detective raised an eyebrow. 

“So you think he did it?” 

Kirby responded sheepishly. “No, I’m afraid not. He’s the prime suspect in this case however, and I don’t know how to test his alibi. There’s just something off about the whole thing.”

The detective stood up and walked over to the shifty fisherman. “So you were working at the docks all morning?” The fisherman nodded in affirmation. “And you were going straight home after work? No stops anywhere else?” The fisherman nodded once more. The detective straightened up and looked at his friend. 

“It’s him, Kirby,” the detective said. “It’s him.” Kirby was shocked. 

“But how do you know he’s lying?”

Can you figure out how the detective knew the suspect was lying?

I’ll give you some time to think about it. Perhaps another example will help you understand. 

An unknown virus is spreading across the planet at warp speed. Its presence is troubling and in its early stages, must be contained in order to save more lives. However, our resources are limited. We can only divert so many resources to every area before we are forced to cut back. 

Do we have anything at our disposal, data, tools, anything, to help us predict where an outbreak will occur or find areas that are at risk of becoming a hotspot and thus, need our intervention? 

Consider the following. Every time we type into a search box, we reveal a little about ourselves. This little area houses our embarrassing questions– the ones we’re afraid to ask out loud, the things we fear and the things we wish to understand. By using anonymized, aggregated data, it could be possible to tease out subtle patterns. 

Say someone googles “pancreatic cancer what should i do?” In that case, sure, you can easily determine that maybe the person searching has pancreatic cancer. But what would happen if you worked your way backwards? What if you looked at the data of all the people who searched “just diagnosed with pancreatic cancer” and trawl through the weeks or months leading up to that search to look for any googled health symptoms? That could be done. As humans with obligations and limited attention, we search and we forget. But the search engine remembers. In a way, it’s a memex for symptoms. 

And if you’re thinking this is a bit far-fetched, here’s the kicker: Microsoft already did this with Bing data. They found that searching “indigestion” and then “abdominal pain” correlated with pancreatic cancer while using only “indigestion” as a search query correlated with a lower likelihood of having pancreatic cancer. 

This is significant because when it comes to fighting many diseases–early diagnosis is critical. In the future perhaps, trained professionals could harness these searches and it could become a diagnostic supplement. While there are a ways to go before we can clear this for use on a case by case basis; the possibility is amusing.

Psst! Here’s the paper if you’d like to read it.

Clusters of search engine data localized to an area is something we could use to inspect and predict outbreaks based on search volumes. Spikes in certain search phrases could inform our understanding of the movement and potential symptoms of any rapidly spreading disease. 

In another example, researchers have been able to map COVID-19 outbreaks in India using Google Trends. The data is there, we just need the means and expertise to meaningfully interpret it. In a way, these aren’t obvious clues. They’re invisible–negative clues. 

Now back to the store with the detective. He claims the suspect is lying about working at the docks all morning and is thus, the perpetrator of the crime. 

Let’s hear his reasoning. 

The detective smiled at the skeptical officer. 

“Simple. He says he’s been working at the docks all morning, presumably handling all sorts of fish and aquatic animals. If that were really true, as soon as he walked in, we would’ve experienced the pungent odor of these ‘fish’.”

The odor was the negative clue. 

The moral of the story is to keep an eye out for the non-obvious. We would do well to ask ourselves: what other negative clues elude us? 

Share this story: