Epistles, Enquiry and Ethics

Over the past few weeks I have been thinking a lot about the data that we have access to, and the attitudes that we have about our, and other’s data. When I started working in the NHS in the early nineties there was a paucity of information, with many decisions being made based solely on the, often unsubstantiated, views of others that had ‘gone before. Other decisions were taken based on qualitative feedback from small research panels, where there was no evidence that such feedback was generalisable. However, I have seen the ongoing take up of evidence based medicine, and the associated requirements for critical appraisal skills. I have seen poor quality data from hospitals that no one ever looked at turn into a rich source for commissioning, revalidation, research and more. I have seen the scope of data collected expand from basic data from hospitals, to a wealth of detailed data from primary care, secondary care, mental health services, community services, social care…and the list goes on. But it is not only in health and social care that we see this change. There is so much data flowing around the system now, from social media data, health data from apps, shopping data collected via store cards, credit data, finance data. It is probably easier now to identify the information that isn’t shared with others rather than the data that is.

But, of course, all of this data requires a mindset change. We are no longer looking at small datasets, but massive amounts of data from multiple sources. There has been a lot written about big data, thick data, small data, data management, data science, data pools, data lakes, data warehouses, data clouds….the list is endless. I don’t want to revisit any of these, and others know far more about these things. I do, however, want to think about all of this data under some familiar headings.

Who, what, why, where and when?

Who should analyse data? I’ve blogged a lot about having appropriately skilled people analysing data. It’s something that I feel very strongly about, hence the reason that I teach my analytical courses. For some reason, we do not put the same importance on our private data as we do on financial data. Most companies wouldn’t dream of letting anyone run amok with their financial information and would hire qualified accountants and financial experts to look at, and analyse, it. But surely there is as much risk in analysing other types of data, particularly when the results of such analysis often lead to financial decisions (or even decisions about patient care)? Why then do we think that anyone can play abut with data? For more on this topic, have a look at my blogs entitled ‘Magna est Veritas’ and ‘Everyone’s an Expert’.

I will look at some of the other headings here in future blogs, but I want to concentrate now on the what. What data should we be looking at, given there is so much of it?

I remember a few years ago asking my organisation this exact question. What data should we be analysing. Shouldn’t we think about the ethics of what we can now do? Not surprisingly the response I was given was that we would do things that were legal, and not do things that weren’t legal. That never sat right with me, and I never stopped raising the issue. Of course, since then there have been a number of high profile stories about how data has been harvested and used that have resulted in people thinking about data in a different way. This, in turn, has led to big organisations rethinking how they use the data they hold and collect. We have the legal right to do many things with data, but it is clear that this, of itself, is not enough

For the theological among you, there is a great quote from St Paul in his first letter (epistle) to the church at Corinth.

“I have the right to do anything” you say – but not everything is beneficial. “I have the right to do anything” – but not everything is constructive.

This, to me, is key in how we deal with data. Yes, legally we can do a lot of things with data. Yes, we can publish privacy notices, be transparent about what we are doing, perform privacy impact assessments and ensure we are within the boundaries of Data Protection and GDPR – but still end up doing something that is not beneficial either to us, or to others. And here’s the rub..as data quantity and granularity increases it is assumed that peoples’ mindset and skills will naturally change with it. If I was a surgeon, and many new techniques were developed I would expect to receive new training on both the techniques and how to make the decision on what technique to use and when. Often a clinician will decide that even though a technique could be used, it would not be in the patient’s best interest. Unfortunately, with data, there is little structured training on new techniques, and even less on which techniques to use, and when, and when not to, use them. The data ethics discussion, to my mind, has been organisation focused and concentrated on what data could / should be collected and linked. The challenge is to move this to the individual – for everyone dealing with data to think about and answer some simple questions before launching into analysis.

Is this legal? Is it in line with published privacy information? Has consent been given for data to be used in this way?
If it is legal, should I be looking at this data? Do I have permission?
Am I the best person to look at this data? Do I have the right analytical skills, statistical skills and knowledge of the intricacies of the data?
What extra training / skills / information do I need to do this properly? Can this wait until I have those skills?
What is the question I want to answer? Is it really that important?
Who does this answer benefit? (If it is not the data subjects then some real thought has to be given here)
What will be the impact of a wrong analysis. What is the risk of harm? Can we mitigate against this?
If this was my data, would I be happy for it to be used in this way. If not, why am I happy using other’s data like this?
Is the anticipated answer worth the amount of time and effort to do the analysis? Are the benefits worth the costs?
Is the anticipated answer something that can be shared? Will it cause discomfort and is it worth it? Will it affect the willingness of individuals to share data in the future?

This is just my list, and I am sure there are other questions that could be asked instead. The thing is that none of this is rocket science, and it concerns me that in many cases these types of questions are not asked, or worse the answers ignored.

There was a phrase bandied around before the UK firmed up on our research ethics. It was:

“For the good of medical science”

The implication of this being that it didn’t matter if one or two people were harmed, or had discomfort, if the majority benefited. (It is a very similar belief to that of ‘collateral damage’ that we have heard about so much in recent conflicts. Again, a few people dying for more to benefit has been seen as acceptable.) Certainly, views in terms of how research is practised have changed considerably, and ‘for the good of medical science’ is no longer a valid argument. I don’t know about you, but I think we need to approach our data and related analysis in the same way.

Of course, the implication here is that we should train our analysts, statisticians and data scientists in these sorts of data ethics considerations, and consequently data access should be limited to those who have this understanding.

For those of us who already deal with data, let's make consideration of data ethics as important as the other skills that we practise.