Can data lie?

data science
statistics
Author

Ndze’dzenyuy Lemfon K.

Published

December 1, 2022

When people often say that data does not lie, they are right. But what about our interpretations of the data, given of course, that the data cannot speak for itself?

The famous British Prime Minister, Benjamin Disraeli, famously said, “There are three kinds of lies: lies, damned lies, and statistics”. Disraeli’s political zenith coincided with the period when Britain was enjoying the benefits of the Scottish & English Enlightenments that emphasized the importance of logic, proper reasoning, and the right to think freely instead of being drowned in dogmatic traditions and what many called blind faith. As a politician who had the obviously tricky job of convincing the electorate, one can imagine that Disraeli knew a thing or two about statistics and how to use them to sway public opinion. It is not so out of place to suggest that Disraeli’s comments about statistics were a jibe at his opponents.

The times in which we live are not significantly different. Data is everywhere, and it will run (if it does not already do) most of the systems that are fundamental to our daily lives. I listened to a presentation the other day, and the speaker mentioned that the challenge data scientists face is to convince “old school” managers that data does not lie.

Well intended as the speaker may have been, their argument raises an important question. Data may be unable to lie - more so because it is inanimate than because it is morally conscious - but can those who use data to convince others lie? How much should we trust data and statistics when they are thrown against us to counter our dependence on our “old school” tendencies of living by intuition and feeling?

That by itself is a question that we cannot thoroughly answer. Understood adequately, Data and Statistics are simply the most popular source of Ethos (For those who may have forgotten their lectures on rhetoric, ethos, primarily as used in this context, is the appeal to authority in oratory). In just the same manner that quotes from Barack Obama of Pope Francis can be thrown around with great care to support some entirely very unreasonable positions (and positions the Pope and Obama will hardly agree with), a skilled expert can, with great care, use a given data set to support very illogical positions. Data by itself may always be honest, but humans often use it as a tool for persuasion.

I once read somewhere that accounting is storytelling, and data science may very well be the same. So, in dealing with data - and statistics in general - do not be blinded by the assumed objectivity of mathematics and its processes. Remember that while measuring the temperature at a fixed time for ten years can give us a handy data set, the next logical step of interpreting that data is rife with all kinds of sources of error and limitations. Thus, while the data set may be authoritative in terms of how much it captures and for how long, that should not extend to any interpretation a status of infallibility. Conversely, any interpretation should be assumed fallible precisely for what it is, an interpretation.

There was so much to talk about from this panel discussion, but I think I have done an excellent job of summarising the most important things for you!

Back to top