4 MIN READ

Big data – detecting pathological events and the cost of false alarms

As the Internet of Things develops, the world is accumulating data rapidly.

One of the intriguing ideas is the possibility of extracting information of value from datasets that were not designed or gathered for the purpose of eliciting that information. Even to the point of saying “Let us gather lots of information and then as an afterthought to analyze it to discover valuable information patterns”. For medical data systems, there is interest in gathering large datasets from patient populations and then analyzing the data to identify signals and to discriminate between pathologies.

…there is interest in gathering large datasets from patient populations and then analyzing the data to identify signals and to discriminate between pathologies

Recently, we upgraded our central heating controls to a type that allows wireless control of the system from a mobile device. The temperature records are available, being stored by the provider in the cloud, and allow the temperature of rooms to be viewed minute by minute over the last week. Not exactly the most exciting viewing – however I have noticed how the data clearly shows activity around the house, doors being opened and closed, diurnal temperature variation, sunny intervals and the like.

Whilst on holiday recently I thought it was a good idea to check on the house and so looked at the temperature records remotely (sad, but this was a slightly damp English Lake District destination). I was disturbed to see on one evening the pattern of a sudden temperature drop of a couple of degrees in multiple rooms at 9.30pm and then a recovery, with a shape consistent with doors being opened and closed. Based on what I had observed of the data previously this looked like a high probability that there had been movement in the house. This was worrying, and would continue to be so. Because I wanted a break from the slight damp English Lake District, I decided on balance to take the long drive home, in a warm car, to check on the house. Which thankfully turned out to be fine. So I returned to my holiday, but wondered what could explain the data, a 2 degree drop in closed rooms both upstairs and downstairs for a one hour period, at nine thirty on 1st September.

I think I found the answer, I asked several people if there has been rain that evening and none had a clear recollection of it – why would they? Then I found that an automated weather station located 15km South of my house had reported thunderstorms at ten thirty on 1st September and since there was a 10km/hr Northerly wind reported, it looks like there is a high chance that a storm passed over my home at roughly nine or thereabout that evening. A sudden sharp burst of rain would explain a rapid temperature drop – so the mystery was explained to a high confidence level.

I think my experience highlights some of the challenges faced as we engineer systems to measure and extract information on which a healthcare practitioner, or patient may base a decision to intervene.

Is the signal in the data clear and specific – what are the possible causes and the likelihoods?

Is the signal in the data clear and specific – what are the possible causes and the likelihoods? Do we know all the possible causes of the signal or just those we have happened to see so far in the data set we have?

How do we balance the cost and potential harm of the action taken (further diagnostics or treatment) with the likelihood of the signal correctly predicting the event and the impact if nothing was done and the predicted event turned out to be true.

Do we want to know about non-specific, low likelihood indications?

As in my case, explanation and learning comes from how we intelligently merge the data between disparate datasets. In the example I have given the realistic explanation was found with a combination of brainpower and cloud/big data sources. Quite literally it seems.

Like it