Wednesday 19 August 2020

An A grade mess

The government decided to stop digging an even bigger hole by backtracking on A level results rather than carrying on and applying the now infamous algorithm to GCSEs and making the hole deeper. Nevertheless, this remains a fiasco that will probably generate more headlines yet, with universities now presumably swamped with successful candidates and wondering how to accommodate the extra students while still heeding social distancing. Though no doubt those same universities are relieved to have the maximum possible home grown candidates to replace some of the revenue lost by the decline in more lucrative overseas students. Predictably some universities are saying they will need more money to take the extra students, as their budgets will still be out of balance without fewer Chinese students in particular. Nine UK universities were reliant on Chinese students for more than 20% of their tuition income, led by Glasgow (31%) and Liverpool (29%). Watch out for another self righteous storm of entitlement if Rishi doesn't cough up the cash and some students find they have to spend a gap year not seeing the world.

The remarkable grade inflation (38% more A and A* grades than 2019) means this year's cohort will forever have an asterisk by it, a bit like a super-charged athlete in a dope ridden Olympics, but as A levels are pretty much used solely for university entrance decisions that need not matter too much.

Boris Johnson described the process as "robust" and "dependable" only a few days before it was ditched. I can only assume he hadn't been briefed in any detail about what the outcome would look like. Once that was revealed I imagine he was nearly as unhappy with it as any hard done by student since the way the algorithm gave preference to schools with small class sizes and historically good results worked directly counter to his concept of "levelling up".

In the run up to the U-turn I read a bit of background to the story. This was partly to counter my knee-jerk prejudice that, once the teachers assessments were shown to be preposterously and unrealistically generous, there was probably no good answer to the problem.  But also because this felt to me like a train wreck occurring in slow motion from the time the exams were cancelled.

So I Googled "how much grade inflation in teacher assessments" to check the stat I believe I read that 40% of teacher assessed grades were adjusted downward through application of the Ofqual algorithm. And I found a remarkable plum: a blog post by someone called Dennis Sherwood from way back in May on the Higher Education Policy Institute's website. HEPI tasked Sherwood with tracking the state of this year's public exams. There was much controversy about the failure of Ofqual to publish full details of its algorithm but it did announce the 'key principles' to be used in ensuring GCSE and A level grades were "as fair as they can be". Those principles were:

  • schools submit their central estimates for each candidate and rank order of candidates for each exam
  • Ofqual apply a standardisation model comparing these estimates with the school's track record (over the last 3 years for A levels)
  • Ofqual adjust the grades to fit the model without changing the ranking order
  • Overall the national grade distributions would be broadly in line with previous years

As is clear from the title of his blog post, "Two and a half cheers for Ofqual's standardisation model...",(note 1) Sherwood was fairly complimentary about the basis for the algorithm. Indeed he said "To me this all makes good sense. The rules are simple. There are no behind-the scenes statistics and the process can be replicated at every school. So teachers can have confidence that their centre assessment grades, submitted in compliance with their historical averages, will have a high likelihood of being confirmed rather than over-ruled." Note the italics are my emphasis. And the rest of Sherwood's blog post title was "...so long as schools comply".

So the first obvious problem is what if schools don't comply? If teachers did not moderate their grades in accordance with previous distributions, which was standard practice when I was a lad, and lots of them give lots of their students the benefit of the doubt on grade boundaries, then there will be a lot of grade inflation. As Sherwood pointed out "Ofqual's key objective is to prevent grade inflation". So then the model will produce lots of changes and a high likelihood that many teacher assessments would be reduced rather than confirmed.

But the point that grabbed my attention came next, when Sherwood explained why he only gave two and a half cheers. He gave a detailed example for 6 imaginary schools, showing their grade performance over the previous three years. He assigned 60 pupils at each school grades from A to F, each school having an average of 10 candidates in each grade with a range of plus or minus two. If each school again has 60 candidates they would each school be expected to submit 10 A* grades. But what if schools feel they've had a good year with a strong cohort? Maybe 12 A*s would be pushing it but surely 11 ought to be ok. If they all put in 11 A*s there are 66 instead of the expected 66 and Ofqual's algorithm will throw a wobbly (Sherwood uses the phrase "the board must intervene"). If asked, every school will have a reason why they are a special case, which Sherwood felt would be difficult to judge fairly. If any of these reasons are accepted another school must reduce its number of A* grades to 9, which Sherwood felt "just won't happen". So he concluded "it's in everyone's interests to submit the average, 10". Now I don't have any evidence to support me on this but many schools presumably did not do this, they just pushed the boat out. So Ofqual would then inevitably moderate each of these imaginary schools down to 10 A*s by using the rank order of the school's candidates, downgrading the candidates ranking lower than 10th on the school's list.

However, that's not all. Sherwood said there was one "nasty problem". He even gave the person an imaginary name: Isaac: 

"But what about poor Isaac at school G? He is particularly gifted at Physics, and his school recommends him for an A*, even though the school has never achieved above grade B for years. The submission on behalf of Isaac will easily be identified as an outlier and so is quite likely to be disallowed. Isaac, however, will not be consulted; nor will his teacher. So Isaac will be awarded grade B, consistent with his place at the top of the rank order. He will be a victim, and his school too, for this year’s process traps all schools as prisoners of their pasts."

Wow! So the whole issue of high performing candidates at traditionally poor performing schools was staring everyone in the face 3 months ago! Interestingly, Sherwood didn't seem to think this was a great problem:

"But before we weep too much on Isaac’s behalf, let us remember that Isaac is just one of the huge number of people disadvantaged (to say the very least) by this most pernicious virus, and although this is a pity, many people have suffered far more gravely, and without recourse to the autumn exam at which Isaac can prove his A*++."

Hmm. Sherwood might be an education expert but he hasn't got any political antennae whatsoever. We've had more than a decade now of sound and fury about the dominance of private school candidates and low representation of ethnic minorities at Oxbridge and a whole area of activity has built up around inclusivity and diversity in university admissions. I recall a great hoo-ha about a female state school pupil called Laura Spence from North Tyneside who had straight A*s at GCSE, was predicted to get top grades in her 4 A levels and was the only one of 100 pupils in her school year to apply for Oxbridge. Laura was rejected by Magadalen College Oxford on the grounds that there were 22 candidates, all with similar qualifications, for 5 positions and she had not interviewed as well as others.  This was in the year 2000, Tony Blair ("education, education, education") was PM and it became known as the Laura Spence affair (Note 2).

Yes of course Isaac could miss a year, take his exam and prove his worth but this point as much as any other created the problem for the government. Yes there were strange examples of whacking great downgrades from C to U (fail) because the algorithm demanded that if a school was due to get a fail in a subject then its weakest candidate damned well had to fail. But the problem is that, once the results are published the Isaac's aren't imaginary, they are real teenagers with names and the media will find them within hours. This was the emerging story, running 180 degrees counter to the  "levelling up" agenda, that probably created the most discomfort for the government. Gavin Williamson's inability to ask enough questions to see it coming means that he must be a dead man walking. Johnson  presumably feels it better not to make a change before the schools go back in case they don't and to let him carry the can for any problems with that as well.  

Remember, Sherwood said all this 3 months ago, on 18 May. By 23 July he was warning of looming problems. In another HEPI blog (Note 3) he warned that hindsight shouldn't be cited as an excuse when something goes wrong if it was clear it was going wrong: it's better to use foresight. He noted the Education Select Committee had criticicised Ofqual for not publishing its algorithm and had expressed concern about how fairness was to be ensured for schools lacking three years of data or with small, variable cohorts. Ofqual did publish a slide pack from a symposium which contained some "good news" (small cohorts recognised as needing special treatment), some "bad news" (appeals process still very narrow and technical) and some "sad but unsurprising news" (the vast majority of schools had given optimistic GCSE and A level grade which would have meant an unprecedented rise in results).

Sherwood asked an obvious question - why were teacher assessed grades required at all if it was the intention to moderate them to previous profiles? He proposed two alternative strategies which could have been adopted. The first was for each school to be told exactly how to comply with the "no grade inflation" policy. The exam boards know the historical pattern for every school and subject and how many candidates were entered for 2020. They calculate how many grades are allowed in each subject at each school and send a form for them to fill in the names.They might also allow schools to exceed a grade allocation where there is robust evidence, depending on how much wiggle room for modest grade inflation Ofqual would allow. The second was to trust the teachers to behave with integrity by supplying a spreadsheet set up to calculate grades based on history and dealing with averaging, rounding and year on year variability but enabling them to make adjustments for exceptional individuals. Neighbouring schools would act as external examiners in vetting the judgements and bodies such as the Sixth Form College Association and unions could have been involved in reviewing to suppress "gaming".

I'm not sure either of these options would have worked out a lot better. The first option would still have produced some Isaac type stories ("why did my school suddenly give me a B when I'd been working at A?"). The second option might have stood a chance. However the point is that people saw these problems coming; the government didn't.

I don't know what the thought process was at schools, but it wouldn't stretch the imagination much to think that many thought that all the other schools would be doing the best for their pupils so they should do the same and give every last one of them the benefit of the doubt. And I wouldn't put it past their leaders to have figured out that gaming the system to extreme would break it. Which I'm sure they would regard as a win.

All that said, I'm not without some sympathy for the teachers doing the assessments. I heard one teacher on the radio plaintively saying that he knew some of his students would make a mess of the exam, he just couldn't predict which ones. I guess this is where the rough justice of the ranked list does it's job, perhaps to 80% effectiveness. 

Which is probably better than the accuracy obtained in any normal year. It's worth remembering that exams are a poor way of evaluating candidates. This is partly because of marking errors and valid marking judgements: research has shown two markers can mark a paper differently. Ofqual say

'it is possible for two examiners to give different but appropriate marks for the same answer'.

And there is always the arbitrariness of grade boundaries, to the extent that Ofqual also say

'more than one grade could well be a legitimate reflection of a student's performance'.

It is estimated that 40% of exam grades are incorrect. So there are always injustices, with an estimated (by Ofqual presumably!) 750,000 annual victims of incorrect grades annually in England. Nevertheless exams are the best method yet designed for assessment and are likely to remain so.

Personally would have been aghast at the prospect of teacher assessments for my school exam grades, particularly at GCSE. As a somewhat shy and quiet student (at least until a switch flicked and I became well and truly gobby) I often didn't contribute much in class particularly in subjects I was less fond of. Some teachers might have thought I wasn't interested. But as a swot who often understood things better second time round with a decent short term memory and fairly large appetite for revision my exam results generally eclipsed my report assessments. Mocks usually went well but I think some teachers would have thought that a fluke, whereas I tended to do better when it was for real.

Indeed I was horrified to discover what my headmaster actually thought of me over 50 years ago now when a university interviewer breached protocol and told me what his report said. The report was based on yes, teacher assessments and also a cosy chat in his study. One peer who, having heard what sort of stuff the head found commendable, waxed lyrical about films such as Dr Zhivago (said peer went on to the dizzy academic heights of a PE college). I said what I thought (quelle surprise....)  "The candidate appears only to be interested in football and what he calls 'progressive' rock" the lecturer read out. After a pause and possibly seeing my expression he smiled and said "you'll fit in just fine here".

Years later my older son was a victim of an erroneous assessment by a science teacher which affected his science option at GCSE until Mrs H and I intervened. His physics teacher could not explain why the lad had not been allowed to tackle the more challenging version and corrected the error, though only after a term had elapsed and test results meant we just had to ask a simple "can you explain why" question. The teacher whose assessment had been used, known to us as Mr Woodlouse (note 4), was also the assistant head and our older son had always felt "he just doesn't like me". Don't worry son, it was obviously genetic.

I guess my point here is that the extent of grade inflation indicating a tendency to gross over-marking by teachers with a smaller number of cases where they have allowed bias to make their assessments too low demonstrates that teacher assessments will always be flawed. Some might say so are exams, though I would argue less so, as they are less susceptible to personal bias in either direction.

So one conclusion is that no system is perfect, there are always some wrong grade assessments and always will be. Does it matter? At an individual level it can, though most people dust themselves off and get on with the next opportunity. Unless a good candidate with a very specific and realistic goal is thwarted it generally won't turn out to matter that much, people with talent will succeed.

In the end the real casualties here are firstly, faith in our politicians and our "system" but secondly the "gold" standard of A levels will be tarnished, perhaps irreparably. The really bad outcome from all this will come in the future when the precedent of using teacher assessments effectively unmoderated is used to press for more teacher driven results, as I'm sure it will be, leaving us with no basis for believing anyone's grades. It smacks of prizes for all, a philosophy I have always disliked: one that has been proven to be harmful as it builds a kind of self esteem which expects success to come easily and is so fragile that it collapses when faced with challenging assignments. 

Indeed when 38% of A level results are A or A* and a preposterous 79% of university degrees are first or upper second we've pretty well got there already. Matthew Syed argued in his Sunday Times column this weekend that we owed it to our young people to allow them to fail, rather than expect them to pass everything. I would go further and say it is helpful for them to learn what they are really good at rather than maybe just competent. If the bright ones get A* for everything how do they know which subjects are the ones in which they might go on to be one of the very best?

In a normal year of course this wouldn't matter a jot. I spoke to a recently retired university admissions tutor who told me he had routinely ignored teacher assessments of candidates and had relied entirely on GSCE results and the university's own interaction with candidates. In other words teacher assessments haven't been worth the paper they are written on for some time, if ever.

Even so I don't begrudge the individual students their opportunities, even if I still think far too many go to university these days. It looked like the impact of covid on overseas students might burst that particular bubble by bankrupting some unis. Not yet, it seems.

I've often been heard to say that, when I have voted Conservative it has been because, on balance the party has a broader and more appropriate "gene pool" for government and a better track record of sound administration. Johnson's government has so far proved not to have the intellectual or stamina bandwidth to cope with the enormous pressures it has faced. I have a lot of sympathy for Matt Hancock who I would argue has, on the whole, done a good job - and probably a better one than 95% of the MPs currently in parliament would have done. He has been let down by his officials and quangos (and yes, by "our" NHS at times, particularly its hubristic leader Sir Simon Stevens). But Hancock has failed to ask key questions at the right time. For example, back in February he might have asked:

"all this PPE - is it still there? Is it still in date? What if we need a lot more for covid than flu? Or different stuff? What if it's needed in places other than hospitals? How do we distribute it to where it's needed?"

Similarly Gavin Williamson, in saying he didn't know until last weekend what the A level results were going to look like, has presumably also been let down by his officials, who surely had a responsibility to warn that there would be lots of Isaacs and the government would look an ass. But if he wasn't being told he should have been asking that question weeks if not months ago.

Dennis Sherwood knew what was going to happen. Officials in the Dept for Education presumably also did. But Gavin Williamson waited for the train to hit the buffers before finding out. An F for homework and a U for foresight then, leading to a f*** u*.

Johnson needs to get his most competent people in the critical jobs before his government begins to look accident prone, John Major style, before it has really got started. Major's government went on to be competent and effective, but the damage was done: the electorate was set on change less than half way through its term. Johnson isn't there yet but he's heading firmly in that direction at the moment. 

Note 1. Dennis Sherwood, a scientist by background, was amongst other things an Executive Director of Goldman Sachs  and MD of SRI Consulting (SRI once being known as the Stanford Research Institute, Stanford being ranked in the top five in the world in "major education publications". His blog post is at https://www.hepi.ac.uk/2020/05/18/two-and-a-half-cheers-for-ofquals-standardisation-model-just-so-long-as-schools-comply/

Note 2. See, for example, Wikipedia. Laura was awarded a $65,000 scholarship by Harvard.

Note 3 https://www.hepi.ac.uk/2020/07/23/hindsight-is-a-wonderful-thing/

Note 4. It's not difficult to guess Mr Woodlouse's real name. Unlike my old grammar school head Mr Williams, he's probably still living. No, I'm not worried about a libel case. My son went on to study physics at Uni which Mr W's original decision, based on him supposedly not being able enough, would have precluded. Teacher assessments can prove to be under as well as over estimates. The majority of them do a good job without bias but I expect most of us have seen it or felt it at some time. Centrally set exam papers don't have it in for individual students.


2 comments:

  1. Very interesting Phil. A couple of things. Johnson is not a detail chap; I can't complain as during my time as leader of Sefton Council I did not consider myself a detail person either. However, I had people who were detail-driven who would keep me posted. My question is has Johnson not got detail people? If he had surely he would have known what was about to go incredibly wrong for many young people. A government too obsessed with Brexit and/or too distracted by their own inability to look stable/capable over Covid19? In other words too many big issues but not enough detail people to spread around? An alternative view is having stuffed the Cabinet with people who were told just to do what Cummings told them to do and no more they've stuck to that mantra?

    ReplyDelete
  2. Just not enough bandwidth so overwhelmed to the point where they've stopped thinking. How much do you need to think to ask "what outcome will we get from the algorithm and will it conflict with the government's agenda" does leave me wondering.
    I posted what seems an age ago that Johnson got the management tactics wrong. Hancock should have been told to concentrate on the operation of the health service. Procurement of PPE should have been delegated: all you need is a spec. Distribution of PPE ditto, delegated to a logistics team. Testing ditto. I suspect the health system "blob" wouldn't let go.
    But covid would have swamped most governments I daresay.
    I don't agree about people being just being told by Cummings. After all he wanted more free thinkers, aka "weirdos". I suspect the cabinet, like the civil service, just doesn't have enough free thinkers. Cummings is probably in despair of them.
    Williamson reminds me of the old Clement Attlee saying "not up to it, I'm afraid"

    ReplyDelete