[Informatics] Authentication Records for Informatics
Stephen Trouse
Stephen.Trouse at flinders.vic.edu.au
Fri Apr 15 11:24:25 AEST 2016
Hi Everyone,
I have the authentication record and assessment sheet for Software Dev but not for Informatics. Does anyone know where I can find those?
Stephen
From: informatics-bounces at edulists.com.au [mailto:informatics-bounces at edulists.com.au] On Behalf Of Mark
Sent: Thursday, 14 April 2016 2:03 PM
To: Year 12 VCE Informatics Teachers' Mailing List <informatics at edulists.com.au>
Subject: [Informatics] Some problems with statistics [LONG]
Hi all. For those of you looking for case studies of bad statistical use, I found an interesting read in this...
'STATISTICS DONE WRONG - THE WOEFULLY COMPLETE GUIDE' by Alex Reinhart
no starch press - info at nostarch.com<mailto:info at nostarch.com> www.nostarch.com<http://www.nostarch.com>
ISBN-10: 1-59327-620-6 ISBN-13: 978-1-59327-620-1
Some rather long, but thought-provoking excerpts from the book may be useful for you and kids evaluating statistical data during hypothesis research.
In brief: there are many problems to be found in published research data.
---------------
The problem of rejecting valid conclusions because of unimportant errors...
A conclusion supported by poor statistics can still be correct — statistical and logical errors do not make a conclusion wrong, but merely unsupported.
The problem of only publishing exciting findings...
We only ever see a fraction of medical research, for instance, because few scientists bother publishing “We Tried This Medicine and It Didn’t Seem to Work.” In addition, editors of prestigious journals must maintain their reputation for groundbreaking results, and peer reviewers are naturally prejudiced against negative results. When presented with papers with identical methods and writing, reviewers grade versions with negative results more harshly and detect more methodological errors.
The pharmaceutical industry seems particularly tempted to bias evidence by neglecting to publish studies that show their drugs do not work; subsequent reviewers of the literature may be pleased to find that 12 studies indicate a drug works, without knowing that 8 other unpublished studies suggest it does not. Of course, it’s likely that such results would not be published by peer-reviewed journals even if they were submitted—a strong bias against unexciting results means that studies saying “it didn’t work” never appear and other researchers never see them. Missing data and publication bias plague science, skewing our perceptions of important issues.
The problem with small sample sizes...
In the United States, counties with the lowest rates of kidney cancer tend to be Midwestern, Southern, and Western rural counties. Why might this be? Maybe rural people get more exercise or inhale less-polluted air. Or perhaps they just lead less stressful lives.
On the other hand, counties with the highest rates of kidney cancer tend to be Midwestern, Southern, and Western rural counties.
The problem, of course, is that rural counties have the smallest populations. A single kidney cancer patient in a county with 10 residents gives that county the highest kidney cancer rate in the nation. Small counties hence have much more variation in kidney cancer rates simply because they have so few residents.
The problem with false positives that sound exciting...
http://xkcd.com/882
The problem with Correlation and Causation
When you have used multiple regression to model some outcome—like the probability that a given person will suffer a heart attack, given that person's weight, cholesterol, and so on — it’s tempting to interpret each variable on its own. You might survey thousands of people, asking whether they’ve had a heart attack and then doing a thorough physical examination, and produce a model. Then you use this model to give health advice: lose some weight, you say, and make sure your cholesterol levels fall within this healthy range. Follow these instructions, and your heart attack risk will decrease by 30%!
But that's not what your model says. The model says that people with cholesterol and weight within that range have a 30% lower risk of heart attack; it doesn’t say that if you put an overweight person on a diet and exercise routine, that person will be less likely to have a heart attack. You didn't collect data on that! You didn't intervene and change the weight and cholesterol levels of your volunteers to see what would happen.
There could be a confounding variable here. Perhaps obesity and high cholesterol levels are merely symptoms of some other factor that also causes heart attacks; exercise and statin pills may fix them but perhaps not the heart attacks.
The regression model says lower cholesterol means fewer heart attacks, but that's correlation, not causation.
One example of this problem occurred in a 2010 trial testing whether omega-3 fatty acids, found in fish oil and commonly sold as a health supplement, can reduce the risk of heart attacks. The claim that omega-3 fatty acids reduce heart attack risk was supported by several observational studies, along with some experimental data. Fatty acids have anti-inflammatory properties and can reduce the level of triglycerides in the bloodstream—two qualities known to correlate with reduced heart attack risk. So it was reasoned that omega-3 fatty acids should reduce heart attack risk.
But the evidence was observational. Patients with low triglyceride levels had fewer heart problems, and fish oils reduce triglyceride levels, so it was spuriously concluded that fish oil should protect against heart problems. Only in 2013 was a large randomized controlled trial published, in which patients were given either fish oil or a placebo (olive oil) and monitored for five years. There was no evidence of a beneficial effect of fish oil.
Another problem arises when you control for multiple confounding factors. It’s common to interpret the results by saying, “If weight increases by one pound, with all other variables held constant, then heart attack rates increase by...” Perhaps that is true, but it may not be possible to hold all other variables constant in practice. You can always quote the numbers from the regression equation, but in reality the act of gaining a pound of weight also involves other changes. Nobody ever gains a pound with all other variables held constant, so your regression equation doesn’t translate to reality.
The problem of Simpson's Paradox
When statisticians are asked for an interesting paradoxical result in statistics, they often turn to Simpson’s paradox. Simpson's paradox arises whenever an apparent trend in data, caused by a confounding variable, can be eliminated or reversed by splitting the data into natural groups. There are many examples of the paradox, so let me start with the most popular.
In 1973, the University of California, Berkeley, received 12,763 applications for graduate study. In that year’s admissions process, 44% of male applicants were accepted but only 35% of female applicants were. The university administration, fearing a gender discrimination lawsuit, asked several of its faculty to take a closer look at the data.
Graduate admissions, unlike undergraduate admissions, are handled by each academic department independently. The initial investigation led to a paradoxical conclusion: of 101 separate graduate departments at Berkeley, only 4 departments showed a statistically significant bias against admitting women. At the same time, six departments showed a bias against men, which was more than enough to cancel out the deficit of women caused by the other four departments.
How could Berkeley as a whole appear biased against women when individual departments were generally not? It turns out that men and women did not apply to all departments in equal proportion. For example, nearly two-thirds of the applicants to the English department were women, while only 2% of mechanical engineering applicants were. Furthermore, some graduate departments were more selective than others.
These two factors accounted for the perceived bias. Women tended to apply to departments with many qualified applicants and little funding, while men applied to departments with fewer applicants and surpluses of research grants. The bias was not at Berkeley, where individual departments were generally fair, but further back in the educational process, where women were being shunted into fields of study with fewer graduate opportunities.
The problem with making mistakes
Surveys of statistically significant results reported in medical and psychological trials suggest that many p values are wrong and some statistically insignificant results are actually significant when computed correctly. Even the prestigious journal Nature isn’t perfect, with roughly 38% of papers making typos and calculation errors in their p values. Other reviews find examples of misclassified data, erroneous duplication of data, inclusion of the wrong dataset entirely, and other mix-ups, all concealed by papers that did not describe their analysis in enough detail for the errors to be easily noticed.
The problem of data decay when seeking to verify the data used in previous research
Another problem is the difficulty of keeping track of data as computers are replaced, technology goes obsolete, scientists move to new institutions, and students graduate and leave labs. If the dataset is no longer in use by its creators, they have no incentive to maintain a carefully organized personal archive of datasets, particularly when data has to be reconstructed from floppy disks and filing cabinets. One study of 516 articles published between 1991 and 2011 found that the probability of data being available decayed over time. For papers more than 20 years old, fewer than half of datasets were available.Some authors could not be contacted because their email addresses had changed; others replied that they probably have the data, but it’s on a floppy disk and they no longer have a floppy drive or that the data was on a stolen computer or otherwise lost.
Regards, Mark
with thanks to
'STATISTICS DONE WRONG - THE WOEFULLY COMPLETE GUIDE' by Alex Reinhart
--
Mark Kelly
mark at vceit.com<mailto:mark at vceit.com>
http://vceit.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.edulists.com.au/pipermail/informatics/attachments/20160415/c589231d/attachment-0001.html
More information about the informatics
mailing list