(Half-written blogs haunt my google drive so I’m copying Toby Payne-Cook’s great housekeeping idea and publishing my 12 blog posts of Christmas. This is post 5 of 12 – I hope someone gets something out of reading these drafty ideas, but even if they don’t, I’ll feel better for being able to delete the drafts.)
Eddie (aged 7) and I have been chatting about assessment theory.
- Eddie: Some children in my class know all the grades you get at secondary school. But I only know one grade and that’s grade B.
- ME: Well, let me introduce you to the grades A, C, D, E and F [Doesn’t seem fair to tell a 7-year-old about U]
- Eddie: Do I need to get all the questions right to get a grade A?
- ME: Sometimes, it depends on the difficulty of the test. If it was very difficult you might be given an A if you get them nearly all right
- E: How many is “nearly”?
- ME: Good question… Sometimes people write tests where they know most children will get most questions wrong and where NOBODY gets everything right.
- E: That’s not very kind
We rarely write tests for students by planning in advance what sort of marks distribution we would like. This is usually fine. After all, a teacher writing an ad-hoc test might only get to learn about the shape of the marks distribution for their class once the test has been sat.
However, for government officials who write tests, the scope and difficulty of questions can have a huge effect on how we choose to teach students to prepare for the test itself. Psychometricians often assume we always want a test that is a good discriminator in all parts of the attainment distribution, but the act of creating this test can distort what gets taught in quite undesirable ways.
I am going to describe two primary government assessments – one (phonics) that I think incentivises teachers to teach the things that government wants them to teach and the other (SATs maths) that maths tutors and teachers claim incentivises them to teach some pupils the ‘wrong’ things.
Phonics and the top-heavy marks distribution
The phonics test at the end of Year 1 is a classic test that fails to discriminate well between standards of phonetic decoding or reading for those in the top half of the attainment distribution. Why? It uses a set of quite predictable and constrained question items to see whether the child can phonetically decode reasonably proficiently. It aims to test mastery at quite a low standard within the much wider domain of being able to read.
Imagine we tried to ‘improve’ the assessment so that it discriminated better between all those children who score between 33 and 36? We could add in a standard reading fluency element, for example. What could possibly go wrong… Well, it might mean that teachers then invest more time in these more proficient readers, at the expense of others in the class in order to help them achieve on a test with greater difficulty. But, something else must worse might also happen.
I heard Christopher Such explain (on the Thinking Deeply about Primary Education podcast) how word memorisation can yield short-term gains for students struggling to phonetically decode, but ultimately damages their long-term prospects of learning to read fluently. For a child like this who struggling to achieve a good mark at the end of Key Stage 1, it is possible that by introducing a standard reading fluency assessment, we encourage some word memorisation over phonetic decoding in order to maximise their score.
The key point here is that, if we want students to achieve mastery in a skill, then we should write a test where marks can be gained only if the student improves their mastery of that skill, and not for any other reason. If the consequence is that these mastery assessments are poor discriminators at the top end of the attainment distribution, then that is absolutely fine. The assessment has done its job of incentivising the system to teach the thing we want it to teach to the students we most want it taught to!
KS2 maths and the bell curve with a low average mark
The primary teacher who blogs under the name Solomon Kingsnorth is running an incredible crusade to persuade others that we need to reform the primary National Curriculum so that students have a fighting chance of arriving at secondary school with good levels of fluency in reading, writing and numeracy. He thinks the SATs provide important incentives to primary teachers to deviate away from this goal. I agree with him. However, not everyone does. (Particularly those who subscribe to the Gibbian view that a great curriculum is a massive curriculum with lots of stuff in it.)
I once watched Solomon Kingsnorth and Daisy Christodoulou have a short Twitter exchange about the KS2 maths SATs. Roughly speaking it went like this [I should probably search for the text…]:
- SK: It’s terrible that you can reach the ‘Expected’ level in SATs maths and get most of the answers wrong.
- DC: It’s fine. It’s just a difficult test which is a good discriminator, i.e. written in order to reliably rank students on a curve.
Who is right? Solomon or Daisy? Well, both are right, of course. But one perspective prioritises the ability of the test to discriminate well between those who are poor, average, good and excellent at maths. The other perspective prioritises the washback effect of the test on what is taught. Specifically, Daisy’s perspective doesn’t consider having a difficult maths SATs paper that tests an enormous curriculum affects how tutors, parents and teachers help students prepare for it.
Maths is a strongly hierarchical subject where mastery matters. Secondary maths teachers would much rather have all students arrive in Year 7 having mastered the content of maths in Years R-4, than have them arrive, as they do, with a pretty sketchy idea of most of the curriculum up to Year 6. Not all content is equal in maths because mastery in some topics is central to everything (e.g. multiplicative reasoning) and mastery in other topics is largely irrelevant (e.g. Roman numerals).
A student who is currently scoring about 30% on the maths SATs paper arrives at a tutor to get help to pass. What should the tutor do to help them if they only have 12 lessons together? At this stage, there is little hope of doing the thing that would help the student in the long term – improve their arithmetic understanding and fluency. So, the tutor picks out periphery topics that don’t rely on these core numerical skills, teaching the student how to spot and answer these questions: coordinates, shape, Roman numerals, reading graphs etc… The tutor succeeds in getting the student an Expected SATs score. The student arrives in Year 7 ill-prepared to tackle the secondary curriculum (where all those periphery topics can be taught from scratch with little difficulty if the student has strong numeracy skills).
How can we incentivise behaviour to teach primary maths in a manner that is consistent with the nature of the hierarchical knowledge structures in maths? Well, that’s a big topic that may have to wait until another day or year. We might want to reflect on whether it is equitable to have a primary maths curriculum so large that most students cannot master it. If we can’t (i.e. won’t) reduce the size of the curriculum, we could improve incentives by restructuring how marks can be gained within the SATs (e.g. should a child who cannot answer 4 x 8 be allowed to gain marks on a periphery Year 6 topic?). I may blog on this one day (but given my track record, I probably won’t).
(Thanks to Eddie and my mum for the ideas in this post – my mum is a retired primary teacher who still likes to tutor in maths from time to time.)