What if we cannot measure pupil progress?

Testing and recording what students know and can do in a subject has always been part of our education system, especially in secondary schools where teachers simply cannot hold in their head accurate information about the hundreds of students they encounter each week. However, measuring progress – the change in attainment between two points in time – seems to be a rather more recent trend. The system – headteachers, inspectors, advisors – often wants to measure something quite precise: has a child learnt enough in a subject this year, relative to other children who had the same starting point?

The talks I have given recently at ResearchED Durrington and Northern Rocks set out why relatively short, standardised tests that are designed to be administered in a 45-minute/one hour lesson are rarely going to be reliable enough to infer much about individual pupil progress. There is a technical paper and a blog post that outlines some of the work that we’ve been conducting on the EEF test database that led us to start thinking about how these tests are used in schools. This blog post simply sets out a few conclusions to help schools make reasonable inferences from test data.

We can say a lot about attainment, even if progress is poorly measured

No test measures attainment precisely and short tests are inevitably less reliable than long tests. The typical lesson-long tests used by schools at the end of a term or year are reliable enough to infer approximately where a student sits on a bell curve that scores all test-takers from the least good to the best in the subject. This works OK, provided all the students are studying the same curriculum in approximately the same order (a big issue in some subjects)!

Let’s take a student who scored 109 in a maths test at the start of the year. We cannot use that single score to assert that they must be better at maths than someone scoring 108 or 107. However, it is a good bet that they are better at maths than someone scoring 99. This is really useful information about maths attainment.

When we use standardised tests to measure relative progress, we often look to see whether a student has moved up (good) or down (down) the bell curve. This student scored 114 at the end of tell the year. On the face of it this looks like they’ve made good progress, and learnt more than similar students over the course of the year. However, 109 is a noisy measure of what they knew at the start of the year and 114 is a noisy measure of what they knew at the end of the year. Neither test is reliable enough to say if this individual pupil’s progress is actually better or worse than should be expected, given their starting point.

Slide2newnew

Dylan Wiliam (2010) explains that the challenge of measuring annual test score growth occurs because “the progress of individual students is slow compared to the variability of achievement within the age cohort”. This means that a school will typically find that only a minority of their pupils record a test score growth statistically significantly different from zero.

Aggregation is the friend of reliability

You can make a test more reliable by making it longer, sat over multiple papers, but this isn’t normally compatible with the day-to-day business of teaching and learning. However, teachers who regularly ask students to complete class quizzes and homework have the opportunity to compile a battery of data on how well a student is attaining. Although teachers will understandably worry that this ‘data’ isn’t as valid as a well-designed test, intelligently aggregating test and classwork data is likely to lead to a more reliable inference about a pupil’s attainment than relying on the short end-of-term test alone. (Of course, this ‘rough aggregation’ is exactly what teachers used to do when discussing attainment with parents, before pupil tracking was transferred from the teacher markbook to the centralised tracking software!)

Teacher accountability is the enemy of inference

Teachers always mediate tests in schools. They might help write the test, see it in advance, warn pupils or parents about the impending test, give guidance on revision, advise pupils about the consequences of doing badly, and so on. If the tests are high-stakes for teachers (i.e. used in performance management) and yet low-stakes for the pupils, it can become difficult for the MAT or school to ensure tests are sat in standardised conditions.

For example, if some teachers see the test in advance they might distort advice regarding revision topics in a manner that improves test performance but not the wider pupil knowledge domain. Moreover, some teachers may have an incentive to try to raise the stakes for pupils in an attempt to increase test persistence. The impact of the testing environment and perception of test stakes has been widely studied in the psychometric literature. In short, we need to be sure that standardised tests (of a standardised curriculum) are sat in standardised conditions where students and teachers have standardised perceptions of the importance of the test. For headteachers to make valid inferences across classrooms, or across schools, they need to be clear that they understand how the stakes are being framed for all students taking the test, even those who are not in their own school!

I think this presents a genuine problem for teacher accountability. One of the main reasons we calculate progress figures is to try to hold teachers to account for what they are doing, but very act of raising the stakes for teachers (and not necessarily for pupils) can create variable test environments that threaten our ability to measure progress reliably!

The longer a test is in place, the more it risks distorting curriculum

A test can only ever sample the wider subject knowledge domain you are interested in assessing. This can create a problem where, as teachers become more familiar with the test, they will ‘bend’ their teaching to towards the test items. Once this happens, the test itself becomes a poor proxy for the true subject knowledge domain. There are situations where this can seriously damage pupil learning. For example, many primary teachers report that one very popular standardised test is rather weak on arithmetic compared to SATs; given how important automaticity in arithmetic is, let’s hope no year 3, 4 or 5 teachers are being judged on their class performance in this test!

Our best hopes for avoiding serious curriculum distortion (or assessment washback) are two-fold. First, lower the stakes for teachers (see above). Second, make the test less well-known or less predictable for teachers. In the extreme, we hear of schools that employ external consultants to write end-of-year tests so that the class teachers cannot see them in advance. More realistically, frequently changing the content of the test can help minimise curriculum distortion, but is clearly time-consuming to organise. Furthermore, if the test changes each year then subject departments cannot straightforwardly monitor whether year group cohorts are doing better or worse than previous years.

None of this is a good reason not to make extensive use of tests in class!

Sitting tests and quizzes is an incredibly productive way to learn. Retrieval during a test aids later retention. Testing can produce better organisation of knowledge or schemas. As a consequence of this, testing can even facilitate retrieval of material that was not tested and can improve transfer of knowledge to new contexts.

Tests can be great for motivation. They encourage students to study! They can improve metacognitive monitoring to help students makes sense of what they know (and don’t yet know).

Tests can aid teacher planning and curriculum design. They can identify gaps in knowledge and provide useful feedback to instructors. Planning a series of assessments forces us to clarify what we intend students to learn and to remember in one month, one year, three years, five years, and so on.

Are we better off pretending we can measure progress?

I’m no longer sure that anybody is creating reliable termly or annual pupil progress data by subject. (If you think you are then please tell me how!) Perhaps we don’t really need to have accurate measures of pupil progress to carry on teaching in our classrooms. Education has survived for a long time without them. Perhaps SLT and Ofsted don’t really mind if we aren’t measuring pupil progress, so long as we all pretend we are. Pretending we are measuring pupil progress creates pressure on teachers through the accountability system. Perhaps that’s all we want, even if the metrics are garbage.

Moreover, I don’t know whether the English education system can live in a world where we know that we cannot straightforwardly measure pupil progress. But I am persuaded by this wonderful blogpost (written some time ago) by headteacher Matthew Evans that we must comes to terms with this reality. Like many other commentators on school accountability, he draws an analogy with The Matrix film in which Neo must decide whether to swallow the red or blue pill:

Accepting that we probably can’t tell if learning is taking place is tantamount to the factory manager admitting that he can’t judge the quality of the firm’s product, or the football manager telling his players that he doesn’t know how well they played. The blue pill takes us to a world in which leaders lead with confidence, clarity and certainty. That’s a comfortable world for everyone, not just the leader.

He goes on to argue, however, that we must swallow the red pill, because:

However grim and difficult reality is, at least it is authentic. To willingly deceive ourselves, or be manipulated by a deceitful other (like Descartes’ demon), is somehow to surrender our humanity.

And so, what if we all – teachers, researchers, heads, inspectors – accept that we are not currently measuring pupil progress?

What then?

39 thoughts on “What if we cannot measure pupil progress?”

Ed

This makes so much sense. I still don’t think we even know what “progress” means and looks like.
For example; Lets say in year 7 geography pupils study 3 topics, Rainforests, Hazards and then Rivers. After each topic they undertake an end of topic assessment. What is progress?? If the pupil scores 50% in all 3 have they made progress? If they score 50% in Rainforests, 70% in Hazards and 60% in Rivers have they made progress?
I suppose adding the class average next to each scores helps in determine where abouts they sit on the bell shaped distribution curve.
Rainforests 50% – Class Average 40%. They are 10% above the class average – is this progress?
Hazards 70% – Class average 50%. They are now 20% above the class average – have they made progress compared to topic 1?
Rivers 60% – Class average 60% They are now equal to the class average. Have they now regressed?
Then lets add their KS2 starting point data. Lets say they have an “average” KS2 score for the class. Does this mean they made “Good” progress in Rainforests, “Excellent” progress in Hazards and then “Average” progress in Rivers???

May 23, 2018 at 5:52 pm Reply
englishteacher688

Reblogged this on The Learning Project and commented:
Insightful and timely.

May 24, 2018 at 3:34 am Reply
englishteacher688

Thanks for this insightful comment. I’m in the middle of trying to fathom how data might be reliably used in schools to drive meaningful changes to instruction in the classroom. I think there is a disconnect between numbers on the page and making valid inferences from them. There is also a fear culture around data because of fo the accountability you mention and this leads to a cart leading the horse situation where weeks of curriculum time are spent preparing for a test. While this is not necessarily a bad thing to help students prepare, it always strikes me as odd that teachers write the tests or exams and then spend oodles of time making sure that students can’t fail them. I’m not sure we derive anything useful from this in terms of guiding our teaching. I’m also increasingly baffled by the idea of the reliability of progress measures being used to indicate likely future performance. CAT4, PTEs, flight paths and all of that seem to me to reduce students to robots with little idea of any external influence on their learning. The mantra seems to be that if you can accurately predict GCSE outcomes then you are a good teacher. I’m not satisfied by this at all. Anyway, thanks for a thought-provoking read.

May 24, 2018 at 3:43 am Reply
Pingback: The fallacy of learning progressions – Filling the pail
Doug Green

Becky: Great article. Look for it posted today at http://DrDougGreen.Com. I will also be following you on Twitter. Hope you follow back. Also, the last line of the 1st paragraph, “learn” should be “learned”. Keep up the good work.

May 25, 2018 at 12:16 pm Reply
Pingback: Alphabetical Signposts to Teacher Excellence – A – Teach innovate reflect
aartisrivastav

An excellent read. Raises very relevant questions relating to actual progress and performance in a test. I especially appreciate the thoughts expressed by Ed.

May 31, 2018 at 6:35 am Reply
Pingback: Assessment is for learning: Using summative tests | Classroom Monitor
Pingback: Workload in three words | educontrarianblog
Pingback: What’s the point of learning objectives? Myths – David Rogers
longsandscpd

Reblogged this on .

June 22, 2018 at 8:16 am Reply
Pingback: June 2018: What’s News in Education - Maths Pathway
Pingback: Teaching maters, but there are more important things to get right | David Didau: The Learning Spy
Pingback: Useful bits and pieces – A Chemical Orthodoxy
Pingback: What if we cannot measure pupil progress? – KBA Teaching and Learning
Pingback: Proof – the final frontier for teachers – Peer Reviewed Education Blog
Pingback: The trouble with Tribbles and Target grades – Reflections in Science Education
Pingback: More evidence-based argument on the ‘attainment gap’ fallacy | Roger Titcombe's Learning Matters
Pingback: Assessment is for learning: Using summative tests - Classroom Monitor
Pingback: Poor attainment data often comes too late! – Becky Allen's musings on education policy
Pingback: The six best blogs I’ve read this year – A Chemical Orthodoxy
Pingback: The fall and rise of educational orthodoxy – 2018 revisited – EduContrarian
Pingback: 12 Golden Gifts from the Edusphere in 2018 | teacherhead
Pingback: COPIED WORK/NOT MINE.JUST TOUCHED BY IT – NAMUKAMBA EDUCATION CENTRE
Pingback: Data’s veil of ignorance – A Chemical Orthodoxy
Pingback: Our favourite education conversations of 2018 (and how they might develop in 2019) - Institute for Teaching
Pingback: School data and the origin of life – Reflections in Science Education
Pingback: The book scrutiny monster – Becky Allen
Pingback: Chupa Chups – Assistant Principal's Office Blog
Pingback: June 2018: What’s News in Education | Maths Pathway
Pingback: Are pupil progress meetings worthwhile? – Occam's Hairdryer
Pingback: Professional confidence in “data drops” – KristianStill
Pingback: Teaching matters, but there are more important things to get right – David Didau
Pingback: Assessment and progression in history – Musings of a history teacher
higginsonmaths

Really interesting article Rebecca. I agree with the ‘noise’ with S-Scores, but could progress be measured by looking at whether the score was with the 90% CI or outside.
Expected progress would be defined by staying within the CI. Good as above and Oustanding as well above.

I have just written a blog on this here

https://higginsonmaths.wordpress.com/

Please comment as this is my first real blog.

February 14, 2021 at 5:13 pm Reply
Pingback: Gateway Questions – Kat Howard
Pingback: Conceptual foundations: sharing the findings from the implementation of Gateway questions – Kat Howard
Pingback: Why using the curriculum as your progression model means you can’t ‘measure progress’ – David Didau
Pingback: How do we know if students try on the NAPLAN test? – Assistant Principal's Office Blog