[Informatics] Factors that harm accuracy. (Warning - bad cow jokes.)
Mark
mark at vceit.com
Fri Sep 2 15:01:51 AEST 2016
Hi, accurate ones,
Following on from the recent fist fight about accuracy vs correctness,
today I have been pondering...
*ITI U3O2KK05 - "criteria to check the integrity of data including ...
accuracy"*
I started wondering about factors that might affect the *accuracy* of data
or information.
N.B. *Not* 'correctness'. Or 'precision'.
Let's accept that data are an abstract representation of phenomena in the
real world, such as "There are four cows in the top paddock."
The actual cows are the reality.
The recorded number of them is the data.
The data representing this cow fact might be - or become - inaccurate due
to:
- going out of date - the state of the real world has changed (a few new
cows were put into the paddock) but the data has not been updated to
reflect that change.
Or the number of cows had been copied from one database to a mirrored site,
but the mirror has not been synchronised recently with the master copy so
the mirror is no longer representative of the true current size of the herd
of cows*.
- being damaged - someone accidentally or deliberately changes the text "4
cows" to "40 cows".
Or disk rot caused the recorded data to be misread by the digital system.
- poor initial data collection - the cows were counted by someone glancing
out of the window of a fast-moving car, or the cows kept moving about and
some were counted twice.
- an unreliable data source - there were not four cows: they were goats.
The data came from an idiot in the city who could not tell the difference.
- bias - the cow-counter had some reason to misrepresent the true number of
cows in the top paddock, e.g. to reduce his tax bill, or to impress the
milk maid next door.
- faulty data processing - the wrong formula was used in the spreadsheet
that added up the number of cows.
- poor validation - the number of cows was entered into the software as
"four" instead of "4". The software was not expecting text, and - because
there were no recognisable digits - its *VAL(numCows)* function converted
the word "four" into a numeric value of zero.
- translation or conversion errors - the reader of the data did not speak
English and relied on an incorrect electronic translation of the text.
Or the text said "There are 4 (four) cows" and a careless data entry person
entered "There are 4 (4) cows" which was later turned into "There are 44
cows".
Or poor optical character recognition of "4 cows" became "9 cows".
True story: I was listening to an audiobook about the history of space
travel and was astounded to hear that "Apollo Two landed on the moon." It
took me a while to realise that the text must have been "Apollo 11 landed
on the moon" but the narrator mistook "11" for the Roman numerals "II".
Sadly, no cows were involved.
Can anyone think of other interesting ways in which data/info accuracy may
be reduced ?
Regards,
Mark
Notes:
1. This post is cow-neutral. I have NO bovine agenda. It is true that I did
help milk cows when I lived on a dairy farm in Yarrawalla while teaching in
Boort (1980-1985) but there was NO inappropriate activity, *especially*
with cow 1056. Rumours to the contrary are udderly untrue, in spite of what
Doris might moo.
2. To comply with the federal *Cow Comedy Act (1953)*, I am obliged to
refer you to the anthem: Cows with Guns
<https://www.youtube.com/watch?v=FQMbXvn2RNI>.
* Herd of cows? Of course I've heard of cows. Bad cow pun.
--
Mark Kelly
mark at vceit.com
http://vceit.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.edulists.com.au/pipermail/informatics/attachments/20160902/a5c5a61c/attachment.html
More information about the informatics
mailing list