Saturday, 3 September 2016

Some thoughts on the Statcheck project



Yesterday, a piece in Retractionwatch covered a new study, in which results of automated statistics checks on 50,000 psychology papers are to be made public on the PubPeer website.
I had advance warning, because a study of mine had been included in what was presumably a dry run, and this led to me receiving an email on 26th August as follows:
Assuming someone had a critical comment on this paper, I duly clicked on the link, and had a moment of double-take when I read the comment.
Now, this seemed like overkill to me, and I posted a rather grumpy tweet about it. There was a bit of to and fro on Twitter with Chris Hartgerink, one of the researchers on the Statcheck project, and with the folks at Pubpeer, where I explained why I was grumpy and they defended their approach; as far as I was concerned it was not a big deal, and if nobody else found this odd, I was prepared to let it go.
But then a couple of journalists got interested, and I sent them a more detailed thoughts.
I was quoted in the Retraction Watch piece, but I thought it worth reporting my response in full here, because the quotes could be interpreted as indicating I disapprove of the Statcheck project and am defensive about errors in my work. Neither of those is true. I think the project is an interesting piece of work; my concern is solely with the way in which feedback to authors is being implemented. So here is the email I sent to journalists in full:
I am in general a strong supporter of the reproducibility movement and I agree it could be useful to document the extent to which the existing psychology literature contains statistical errors.
However, I think there are 2 problems with how this is being done in the PubPeer study.
1. The tone of the PubPeer comments will, I suspect alienate many people. As I argued on Twitter, I found it irritating to get an email saying a paper of mine had been discussed on PubPeer, only to find that this referred to a comment stating that zero errors had been found in the statistics of that paper.
I don't think we need to be told that - by all means report somewhere a list of the papers that were checked and found to be error-free, but you don't need to personally contact all the authors and clog up PubPeer with comments of this kind.
My main concern was that during an exceptionally busy period, this was just another distraction from other things. Chris Hartgerink replied that I was free to ignore the email, but that would be extremely rash because a comment on PubPeer usually means that someone has a criticism of your paper.
As someone who works on language, I also found the pragmatics of the communication non-optimal. If you write and tell someone that you've found zero errors in their paper, the implication is that this is surprising, because you don't go around stating the obvious*. And indeed, the final part of the comment basically said that your work may well have errors in it and even though they hadn't found them, we couldn't trust it.
Now at the same time as having that reaction, I appreciate this was a computer-generated message, written by non-native English speakers, that I should not take it personally, and no slur on my work was intended. And I would like to know if errors were found in my stats, and it is entirely possible that there are some, since none of us is perfect. So I don't want to over-react, but I think that if I, as someone basically sympathetic to this agenda, was irritated by the style of the communication, then the odds are this will stoke real hostility for those who are already dubious about what has been termed 'bullying' and so on by people interested in reproducibility.
2. I'll be interested to see how this pans out for people where errors are found.
My personal view is that the focus should be on errors that do change the conclusions of the paper.
I think at least a sample of these should be hand-checked so we have some idea of the error rate - I'm not sure if this has been done, but the PubPeer comment certainly gave no indication of that - it just basically said there's probably an error in your stats but we can't guarantee that there is, putting the onus on the author to then check it out.
If it's known that on 99% of occasions the automated check is accurate, then fine. If the accuracy is only 90% I'd be really unhappy about the current process as it would be leading to lots of people putting time into checking their papers on the basis of an insufficiently sensitive diagnostic. It would make the authors of the comments look frankly lazy in stirring up doubts about someone's work and then leaving them to check it out.
In epidemiology the terms sensitivity and specificity are used to refer to the accuracy of a diagnostic test. Minimally if the sensitivity and specificity of the automated stats check is known, then those figures should be provided with the automated message.

The above was written before Dalmeet drew my attention to the second paper, in which errors had been found. Here’s how I responded to that:

I hadn't seen the 2nd paper - presumably because I was not the corresponding author on that one. It's immediately apparent that the problem is that F ratios have been reported with one degree of freedom, when there should be two. In fact, it's not clear how the automated program could assign any p-value in this situation.
I'll communicate with the first author, Thalia Eley, about this, as it does need fixing for the scientific record, but, given the sample size (on which the second, missing, degree of freedom is based), the reported p-values would appear to be accurate.
  I have added a comment to this effect on the PubPeer site.


* I was thinking here of Gricean maxims, especially maxim of relation. 

Thursday, 1 September 2016

Why I still use Excel



The Microsoft application, Excel, was in the news for all the wrong reasons last week.  A paper in Genome Biology documented how numerous scientific papers had errors in their data because they had used default settings in Excel, which had unhelpfully converted gene names to dates or floating point numbers. It was hard to spot as it didn't do it to all gene names, but, for instance, the gene Septin 2, with acronym SEPT2 would be turned into 2006/09/02.  This is not new: this paper in 2004 documented the problem, but it seems many people weren't aware of it, and it is now estimated that the literature on genetics is riddled with errors as a consequence. 
This isn't the only way Excel can mess up your data. If you want to enter a date, you need to be very careful to ensure you have the correct setting. If you are in the UK and you enter a date like 23/4/16, then it will be correctly entered as 23rd April, regardless of the setting. But if you enter 12/4/16, it will be treated as 4th December if you are on US settings and as 12th April if you are on UK settings.
Then there is the dreaded autocomplete function. This can really screw things up by assuming that if you start typing text into a cell, you want it the same as a previous entry in that column that begins with the same sequence of letters. Can be a boon and a time-saver in some circumstances, but a way to introduce major errors in others.
I've also experienced odd bugs in Excel's autofill function, which makes it easy to copy a formula across columns or rows. It's possible for a file to become corrupted so that the cells referenced in the formula are wrong. Such errors are also often introduced by users, but I've experienced corrupted files containing formulae, which is pretty scary.
The response to this by many people is to say serious scientists shouldn't use Excel.  It's just too risky having software that can actively introduce errors into your data entry or computations. But people, including me, persist in using it, and we have to consider why.
So what are the advantages of keeping going with Excel?
Well, first, it usually comes for free with Microsoft computers, so it is widely available free of charge*. This means most people will have some familiarity with it –though few both to learn how to use it properly.
Second, you can scan a whole dataset easily: it's very direct scrolling through rows or columns. You can use Freeze Panes to keep column and row headers static, and you can hide columns or rows that you don't want getting in the way.
Third, you can format a worksheet to facilitate data entry. A lot of people dismiss colour coding of columns as prettification, but it can help ensure you keep the right data in the right place. Data validation is easily added and can ensure that only valid values are entered.
Fourth, you can add textual comments – either as a row in their own right, or using the Comment function.
Fifth, you can very easily plot data. Better still, you can do so dynamically, as it is easy to create a plot and then change the data range it refers to.
Sixth, you can use lookup functions. In my line of work we need to convert raw scores to standard scores based on normative data. This is typically done using tables of numbers in a manual, which makes it very easy to introduce human error. I have found it is worth investing time to get the large table of numbers entered as a separate worksheet, so we can then automate the lookup functions.
Many of my datasets are slowly generated over a period of years: we gather large amounts of data on individuals, record responses on paper, and then enter the data as it comes in. The people doing the data entry are mostly research assistants who are relatively inexperienced. So having a very transparent method of data entry, which can include clear instructions on the worksheet, and data validation, is important. I'm not sure there are other options of software that would suit my needs.
But I'm concerned about errors and need strategies to avoid them. So here are the working rules I have developed so far.
1. Before you begin, turn off any fancy Excel defaults you don't need. And if entering gene names, ensure they are entered as text.
2. Double data entry is crucial: have the data re-entered from scratch when the whole dataset is in, and cross-check the data files. This costs money but is important for data quality. There are always errors.
3. Once you have the key data entered and checked, export it to a simple, robust format such as tab-separated text. It can then be read and re-used by people working with other packages.
4. The main analysis should be done using software that generates a script that means the whole analysis can be reproduced. Excel is therefore not suitable. I increasingly use R, though SPSS is another option, provided you keep a syntax file.
5. I still like to cross-check analyses using Excel – even if it is just to do a quick plot to ensure that the pattern of results is consistent with an analysis done in R.  
Now, I am not an expert data scientist – far from it. I'm just someone who has been analysing data for many years and learned a few things along the way. Like most people, I tend to stick with what I know, as there are costs in mastering new skills, but I will change if I can see benefits. I've become convinced that R is the way to go for data analysis, but I do think Excel still has its uses, as a complement to other methods for storing, checking and analysing data. But, given the recent crisis in genetics, I'd be interested to hear what others think about optimal, affordable approaches to data entry and data analysis – with or without Excel.

*P.S.  I have been corrected on Twitter by people who have told me it is NOT free; the price for Microsoft products may be bundled in with the cost of the machine, but someone somewhere is paying for it!

Update: 2nd September 2016
There was a surprising amount of support for this post on Twitter, mixed in with anticipated criticism from those who just told me Excel is rubbish. What's interesting is that very few of the latter group could suggest a useable alternative for data entry (and some had clearly not read my post and thought I was advocating using Excel for data analysis). And yes, I don't regard Access as a usable alternative: been there tried that, and it just induced a lot of swearing.
There was, however, one suggestion that looks very promising and which I will chase up
@stephenelane suggested I look at REDcap.
Website here: https://projectredcap.org/

Meanwhile, here's a very useful link on setting up Excel worksheets to avoid later problems that came in via @tjmahr on Twitter
http://kbroman.org/dataorg/
 

Saturday, 6 August 2016

Alternative providers and alternative medicine

Jo Johnson thinks that the market in Higher Education is unfair. There are currently stringent procedures in place for any institution that wants to award degrees and call itself a University. One facet of this is the requirement that any new provider must initially have its degrees validated by an established University; The University of Suffolk, which gained University title this month, provides an example of how this can work, but many of those seeking to enter the market are unhappy with current arrangements.  Roxanne Stockwell, Principal of Pearson College, complained: “Until an institution has its own degree-awarding powers, it cannot offer degrees without being validated by an existing university. Under the current system, a new partner has to find a willing validating partner, and it is locked out if it cannot.”

Jo Johnson, in his role as Minister for Universities and Science last year, criticised this arrangement. He memorably said: “I know some validation relationships work well, but the requirement for new providers to seek out a suitable validating body from amongst the pool of incumbents is quite frankly anti-competitive. It’s akin to Byron Burger having to ask permission of McDonald’s to open up a new restaurant. .....I can announce that we will shortly be lifting the moratorium that has been in place for applications for new Degree Awarding Powers and for University Title. Once again, we are opening the doors to new entrants and challenger institutions, all in the interest of increasing the choices available to students.”

So how wide should the door be opened? This raises deep questions about what constitutes a University. Currently, British Universities have a strong international reputation. Historically, they have been subject to strict scrutiny in return for receiving government funding. They combine research and teaching to push forward the boundaries of knowledge, and have trained students to value knowledge for its own sake, not just as a means to an end.

Not everyone wants that kind of education: some students do not enjoy formal academic study and may prefer vocational courses or apprenticeships. It is important that our higher education system caters for them. The question confronting us now is how far we should extend the definition of a University. It is interesting to consider the Word Cloud I made of ‘alternative providers’, shown in Figure 1.

Figure 1: Alternative providers from http://www.hefce.ac.uk/reg/register/getthedata/. Key: Blue have University Title; Brown have Degree-Awarding Powers; Pink offer designated courses; Violet deliver HE as a franchise only. Taken from http://cdbu.org.uk/speedy-entrances-and-sharp-exits-letting-in-more-alternative-providers/.
Among those listed are six institutions providing training in various forms of complementary and alternative medicine. Two of these had progressed to the point of having degree-awarding powers, which is one step below having University Title.

Let us be clear: the subject matter that these institutions teach is not endorsed by serious scientists.  David Colquhoun highlighted a worrying trend for UK Universities to give degrees in ‘anti-science’ back in 2007. In Australia, where chiropractic has been taught within the regular University system, it has come under increasing attack by scientists, who note that by awarding degrees in these subjects, one gives credibility to procedures that are placebos at best and dangerous at worst.

Are these concerns just a sign of anti-competitiveness and elitism? Are scientists trying to squeeze out the alternative providers because they think they’ll poach students from them, just as McDonalds might try to ensure that Byron Burgers are denied space for development? I’d argue not, and furthermore that Jo Johnson’s analogy shows a startling lack of understanding of what a University is all about. Medicine has had many false leads and it would be the height of arrogance to assume that what we know now is the only truth. But the difference between medicine and alternative medicine is not just that there is evidence for effectiveness for medicine; it’s also that in medicine there is a continuous movement to develop and improve theory and practice, rigorously testing and debating ideas and using scientific methods to evaluate them. It’s difficult to do this well and it often goes wrong, but there is broad agreement about the importance of evidence.

Well, you might say, what about religion, another topic that features heavily among alternative providers? Should theology be banned from Universities because it is not evidence-based? The answer is no, and for similar reasons to those given above: the difference between our traditional Universities and the new providers is that Universities teaching theology consider a range of perspectives and teach students critical thinking. You do not need to be a believer in any God to study theology. In contrast, new providers are often narrow in their focus: many of those with religious affiliations look as if they train students in one religious viewpoint and one only.

Defining the difference between what is suitable and not suitable for inclusion in a University degree course is itself an interesting intellectual exercise. We should not assume that something is good just because it already exists, and is bad if it is new.  But if we just ignore this distinction and have a free-for-all whereby anything can be regarded as higher education provided that there are students willing to pay for it, we will end up with a system in which the terms University and Degree will count for very little, and where the survival of a higher education institution has more to do with its marketing skills than its academic standing. In the past, there were few institutions clamouring to become Universities because it was not easy to make a profit from Higher Education. That has all changed now that higher education providers can get their hands on money from the Student Loans Company. Experience to date suggests we need to have more, rather than less, scrutiny of alternative providers in the current financial climate.

One final point: the alternative providers I have discussed here could do very well on the Teaching Excellence Framework (TEF), which will rate higher education institutions according to three main criteria: student satisfaction, drop-out rates and employability. Indeed, if they recruit their students from among disadvantaged social groups, they might well achieve higher TEF scores than more selective institutions, because benchmarking is used to adjust the outcomes. So under Jo Johnson’s oversight, we could end up with a situation where the quality of teaching at the Anglo-European College of Chiropractic is deemed superior to that at the Universities of Oxford and Cambridge. An interesting thought.

Sunday, 17 July 2016

Cost-benefit analysis of the Teaching Excellence Framework

©CartoonStock.com
The government’s new Higher Education and Research Bill gets its second reading this week. One complaint is that it has been rushed in without adequate scrutiny of some key components. I was interested, therefore, to discover, that a Detailed Impact Assessment was published in June, specifically to look at the costs and benefits of the various components of the Bill. What I found was quite shocking: we were being told that the financial benefits of the new Teaching Excellence Framework (TEF) vastly outweighed its costs – yet look in detail and this is all smoke and mirrors.

In particular, the report shows that while the costs of TEF to the higher education sector (confusingly described as ‘business’) are estimated at £20 million, the direct benefits will come to £1,146 million, giving a net benefit of £1,126 million (Table 1). How could the introduction of a new bureaucratic evaluation exercise be so remarkably beneficial? I read on with bated breath.

Well, sad to relate, it’s voodoo analysis.  This becomes clear if you press on to Table 12, which shows the crucial data from statistical modelling. Quite simply, the TEF generates money for institutions that get a good rating because it allows them to increase fees in line with inflation. Institutions that don’t participate in the TEF, or those that fail to get a good enough rating, will not be able to exceed the current 9K per annum fee, and so in real terms their income will decline over time. As far as I can make out, they are not included in Table 1. Furthermore, the increases for the compliant, successful institutions are measured relative to how they would have done if they had not been allowed to raise fees.

So to sum up:
  • You don’t need the TEF to achieve this result. You could get the same outcome by just allowing all institutions to raise fees in line with inflation.
  • As noted in the briefing to the Bill by the House of Commons: “the Bill is expected to result in a net financial benefit to higher education providers of around £1.1billion a year. This is in very large part due to the higher fees that providers with successful TEF outcomes will be able to charge students.” (p. 59)
  • The system is designed for there to be winners and losers, and the losers will inevitably see their real income falling further and further behind the winners, unless inflation is zero.
The impact assessment does consider other options, including that of allowing fee increases in line with inflation provided the institution has a satisfactory Quality Assurance rating. This is rejected on the grounds that: “whilst QA is a good starting point, reliance on QA alone and in the longer-term will not enable significant differentiation of teaching quality to help inform student decisions and encourage institutions to improve their teaching quality.” (p. 37).  This makes clear that one consequence (and one suspects one purpose) of TEF is to facilitate the division into institutional sheep and goats, followed by starvation of the goats.

Another option, which was strongly recommended by many of those who responded to the consultation exercise on the Green Paper which preceded the bill, is to remove the link between TEF and fees. In other words, have some kind of teaching evaluation, where the motivation for taking part would be reputational rather than financial.  This too is rejected as not sufficiently powerful an incentive: “the Research Excellence Framework allocates £1.5bn a year to institutions. To achieve parity of esteem and focus between teaching and research the TEF will need to have a similar level of financial implications.” However, this is rather disingenuous. There is no pot of money on offer. We live in a country where we are used to government supporting Higher Education; now, however, the only source of income to universities for teaching is via student fees, but raising fees is unpopular.  The funding of universities will collapse unless they can either find alternative sources of income, or continue to raise fees in line with inflation, and TEF provides a cover story for doing that.

So we have a system designed to separate winners and losers, but the outcome will depend crucially on two factors: the rate of inflation and the rate of increase in students. The figures in the document have been modelled assuming that the number of students at English Higher Education Institutions will increase at a rate of around 2 per cent per annum (Table 12), and that annual inflation will be around 3 per cent. If either growth in numbers or inflation is lower, then the difference between those who do and don’t get good TEF ratings (and hence the apparent financial benefits of TEF) will decline.

What about the anticipated costs of the TEF?  We are told: “Institutions collectively will experience average annual costs of £22m as a result of familiarising, signing up and applying to the Teaching Excellence Framework, once the TEF covers discipline level assessments. This is equivalent to an average of £53,000 per institution, significantly less than the Research Excellence Framework (REF) at £230,000 per institution per year.” (p. 8). One can only assume that those writing this report have little experience of how academic institutions operate. For instance, they say that “Year One will not represent any additional administrative cost to institutions, as we will use the existing QA process.” I did a quick internet search and immediately found two universities who were advertising now for administrators to work on preparing for the TEF (on salaries of around £30-40K), as well as a consultancy agency that was touting for custom by noting the importance of being “TEF-ready”.

I have yet to get on to the section on costs and benefits of opening the market to ‘alternative providers’…..

If you are concerned at the threats to Higher Education posed by the Bill, please write to your MP - there is a website here that makes it very easy to do so.

Further background reading 
Shaky foundations of the TEF
A lamentable performance by Jo Johnson
More misrepresentation in the Green Paper
The Green Paper’s level playing field risks becoming a morass
NSS and teaching excellence: wrong measure, wrongly analysed
The Higher Education and Research Bill: What's changing?
CDBU's response to the Green Paper
The Alternative White Paper