The voodoo art of ranking schools

When I wrote about reforming accountability measures a few months ago, I promised I’d follow up by explaining why I don’t support any methods for ranking schools from the very worst to the very best, regardless of whether it is an attainment measure, a value-added or ‘progress’ measure or some sort of voodoo ‘contextual value-added’ measure. With the KS4 performance tables reminding us of the importance of school context, it seems like a good time to summarise why I’d like to see them go.

I’ll start with a technical argument about why we cannot isolate the ‘effect’ of a school in observational data, then move onto a slightly more philosophical argument about what a school ‘effect’ is anyway. Finally I’ll say how I like to compare schools.

The technical argument against calculating school quality

Everyone agrees that school attainment doesn’t tell us much about school quality because variation in student attainment also reflects variation in home investments in education, variation in genetic endowments that influence ability to learn, prior knowledge learnt at earlier schools, and so on. Dave Thomson has shown how most of the variation in pupil attainment lies within schools rather than between schools.

Value-added or progress measures purport to fix this by comparing a student’s attainment to those who had a similar prior attainment. This seems fair on schools, right? It doesn’t seem terrible, until you observe that schools with high attaining intakes have high progress scores and those with lower attaining intakes have lower progress scores, on average.

This might well reflect true differences in the quality of schools by intake of the students – we know, for example, affluent schools find it easier to recruit suitably qualified teachers and the students might well benefit from having smarter peers. HOWEVER, the differences could equally be associated with things that have nothing to do with school quality. Affluent schools tend to benefit from families who are more invested in ensuring their students succeed at school. There could equally be ‘phantom’ compositional effects that arise purely from measurement error (see Tom Perry’s work on this). (The best example of this are the kids who get into grammar school, despite very poor SATs results that almost certainly understate their true ability given they passed an 11+, who then go on to do extremely well on GCSE progress measures.)

Now, what contextual value-added does is run a statistical procedure to remove this social gradient in school effects. Though the statisticians who run these models claim they are ‘accounting’ for the impact of family background on attainment, they can do nothing of the sort since we simply do not know how much of the advantage that affluent schools have is down to the families that go there or to their superior school practices and teaching quality.

As I’ve written in the past, attributing NONE of the social gradient that exists across school progress measures to the schools themselves is as wrong as attributing ALL of the social gradient to the schools.

This is the crux of the technical argument: When we observe that a group of students – whether girls, Asians or the children of bankers – are doing well at particular schools, observational data does not let us decompose whether they are doing better because that group tends to be attending better quality schools or whether they are doing better because of social or biological factors unrelated to the school they attend. (This issue is at the heart of the debate about whether London schools are great or whether London schools have great kids in them.)

The philosophical argument against comparing schools with different intakes

So far I’ve argued that it would be nice to know the ‘school effect’ but it is impossible to calculate it in observational data without random assignment of students to schools. Let’s go a little further and ask what is this ‘school effect’ you are trying to calculate anyway? I want to give you a little thought experiment. Suppose there are two schools in a town. One serves an estate of educationally-disengaged families (Progress -0.4) whilst the other serves middle-class commuting families (Progress +0.6). Now suppose we switch all the students in the two schools over one day. What can the old progress scores tell us about the likely future performance of the kids?

I’m sure you’ll agree the answer is ‘not much’, and it isn’t just because those old Progress scores captured differences in family investments as well as differences in school processes. More importantly, these school processes – the policies and practices of teachers in the school – were themselves chosen as a response to the student cohorts they were faced with. The arrival of a community of pupils with a different set of social norms and needs will force a school to rapidly adapt its practices to better meet those particular needs. School qualities will morph as the students change.

We simply cannot describe a thing called a ‘school’s quality’ – the effectiveness of the school’s policies, processes and instructional methods – in a way that is divorced from the nature of the communities that they inhabit. This is the crux of the philosophical argument against drawing comparisons of the quality of school practice and policy at two schools serving different types of communities. They are different because CONTEXT is different.

Both progress scores and contextual value-added place schools with very different intakes onto a single scale, forcing comparisons that we should not treat as meaningful. What should we do instead?

Why I prefer looking at alike schools

When I visit schools and want a quick perspective on their test results, I am not remotely interested in where they rank in the league tables that list schools from the very worst to the very best since those are voodoo statistics.

I ask a much less demanding question – how are they doing compared to other schools that seem to serve similar types of communities? Thankfully, FFT Education Datalab has a brilliant website where you can pick what you mean by ‘similar’ (e.g. similar FSM, EAL, size, region, prior attainment, etc….) and take a look at the outcomes.

Comparing schools to alike schools requires less sophisticated outcomes measures (we can just use attainment measures rather than calculating progress) and could even lead us to collecting baseline attainment data much less frequently (since we only need to use it to check that schools are similar in their intake every so often).

It is a great approach for school improvement since it allows you to seek out schools that are actually like yours, to learn more about how they deal with similar challenges to the ones you might have.

Some argue it leads to a ‘poverty of expectations’ since deprived schools would only be compared to other deprived schools. This is an assertion that doesn’t stand up to interrogation of the data. Taking primary schools as an example, closing the gap between a lower performing primary and the median primary within each group of similar schools would lead to a national 6 percentage point improvement on the EXP in RMW metric. Getting all primaries up to the 90th percentile within their similar group would lead to an improvement of 17 percentage points. Sure, this isn’t the paradise where every primary school does as well as the best in the country, but (in case you hadn’t noticed) bashing schools serving disadvantaged communities over the head with the results of those who serve affluent communities doesn’t seem to have brought us to the promised land either. I’d take a mere 6 percentage point improvement in the headline pass rate in exchange for dropping the school bashing based on voodoo statistics any day.

And then I must mention the parents, I suppose. Yes, you’re absolutely right. Schools like yours aren’t suitable for helping parents identify the best school for their child. But, to be clear, NEITHER ARE LEAGUE TABLES that push nonsense metrics and hugely overstate both the value of seeking out an affluent school and the importance of the school choice decision! I’m very happy for parents to express a preference for a particular school; we shouldn’t lie to them and pretend there is some magic number that will help inform them as to which school their child will perform best in.

Conclusion

Many people who don’t like school rankings are warm, optimistic people who believe life would be rosy without school accountability. I’m not one of those people – monitoring schools has its merits, but we have to do it on the basis of comparisons that are valid.

I have no reason to believe that all schools are equally good. If we could assign one child, cloned nine times over, to ten different schools chosen at random, I suspect we would find some material differences in how well they attained in particular subjects. But short of running lotteries for school places, we can only speculate about how much school assignment matters.

Regardless of how we measure school quality we cannot rank schools across the country from the very worst to the very best. We cannot do this using ‘raw’ exam results, using so-called ‘progress’ measures or even using the ‘contextual value-added’ (CVA) measures that force average school quality to be equalised across demographic communities. We cannot rank schools because school inputs and family inputs to attainment are not separable in observational data.

Let’s stop comparing schools that serve different types of communities and stick to comparisons that are both (reasonably) valid and useful in our drive to help schools improve.

9 thoughts on “The voodoo art of ranking schools”

Raymond Thompson

If the genome has a majority effect on school achievement – world wide research is suggesting this – how can schools organise learning procedures that can blunt this immovable disadvantage? Using the pupil premium has little or no effect. This supports the discussion on school ranking systems that offer no answers. Depressing.

February 7, 2020 at 3:42 pm Reply
Raymond Thompson

If the genome has a majority effect on school achievement – world wide research is suggesting this – how can schools organise learning procedures that can blunt this immovable disadvantage? Using the pupil premium has little or no effect. This supports the discussion on school ranking systems that offer no answers. Depressing.

February 7, 2020 at 3:42 pm Reply
1. Nick von Behr
  
  Hear, hear, hear, hear ad finitum! If only politicians would listen ….
  
  February 21, 2020 at 5:51 pm
@TeacherToolkit

Becky, I can only give you the best compliment I can think of: I wish I had written this and that every parent/journalist and school leader, reads this blog.

February 10, 2020 at 9:53 am Reply
Paul

I like the thought experiment scenario. Taking it a step further though, consider what the outcomes for those two different groups of children would be if they remained in their ‘alternate’ schools for a suitable length of time. My not too jaded and cynical opinion would be that the cohort of students from the middle-class commuting families would out perform their original school predictions, potentially by a significant amount.

February 12, 2020 at 2:46 pm Reply
Pingback: Jon Hutchinson's top edu blogs of the week, 10 February | Global Research Syndicate
Pingback: Data Walls, Policy Change and Teacher Tapp Bamboozle! - Teacher Tapp
Pingback: The ungameable game – Becky Allen
Pingback: Ofsted, the problem, the sequel | Faith in Learning