Technological disruptions to teaching

(This is a rough description of the talk I gave at ResearchED London in September 2023)

Come with me twenty years into the future and try to imagine the person teaching your classes or subject at school. Would their job be similar to today’s teachers? According to a recent survey by Teacher Tapp, 16% of teachers felt the job would be radically different, while a further 30% believed it would be somewhat different. A year ago, I would have disagreed with them, but in this post, I am going to explore how teaching could undergo significant changes over the next two decades.

No one is more surprised that I am writing this post than I am. Reflecting on the past 50 years, teaching has mostly remained the same. A couple of years ago, I co-authored a book with Matt Evans and Ben White titled ‘The Next Big Thing in School Improvement‘, where we discussed why the school system tends to experience long periods of stasis, even as society evolves technologically and socially.

Technologists such as Bill Gates often claim that education is on the cusp of a revolution. Their predictions partly fail to materialise because they don’t fully grasp the complexities of schooling or the nature of learning. But could they finally be right this time?

This post will explore the ways teaching might be assisted or even replaced in the future. But it’s also a discussion about the present. Our Teacher Tapp panel indicates that over a third of teachers have already experimented with AI tools like ChatGPT for planning, resource creation, and assessment. They say these tools have not only saved them significant amounts of time but have also spurred creative solutions. So, how could AI be so useful to teachers if it’s not as intelligent as humans?

The Genius and Yet Error-Prone Pattern Spotter

In this post, I focus almost exclusively on large language models (LLMs) that mimic human language. These are fundamentally different from established AI technologies used in mathematics, statistics, or robotics. Simply put, these LLMs are pattern spotters. They generate words based on a model of human language, constructed from a vast dataset of written text. The underlying mechanism can be summarised as: ‘Given the words you have written, and the words I have generated in response thus far, what is the next word that would best maintain the consistency of this text according to my model?

It’s remarkable how these large language models (LLMs) can generate responses that convincingly mimic human-like reasoning. While experts anticipate that these models will continue to improve their language capabilities—perhaps even to the point of generating new theories, ideas, products, and books—it’s essential to note that they don’t reason or ‘understand’ a school subject in quite the same way human teachers do. 

When I encounter scepticism about the intelligence of large language models, I often bring up the adjective order rule in English. This rule dictates the order in which adjectives should appear: Quantity, opinion, size, age, shape, colour, origin/material, and qualifier. For instance, the correct phrasing would be ‘two beautiful, small, black dogs’.

This rule serves as an interesting example to reflect on the nature of ‘intelligence’ and ‘knowledge’. Native English speakers intuitively follow this rule without being explicitly aware of it, while non-native speakers often learn it formally. Native speakers sometimes unknowingly break the rule for effect or emphasis, a nuance often challenging for non-native speakers to replicate. This discrepancy in grammatical expertise challenges our perceptions of who is more ‘knowledgeable’.

Similarly, large language models demonstrate an impressive ability to coherently arrange information, even without an explicit understanding of underlying rules or structures. Their models might not possess the explicit schema that a subject expert would have, and yet they generate text that is consistent with the text a subject expert might compile. I like to think of LLMs as akin to the generation (i.e. me) who were taught no English grammar at all at school, and yet can write and speak very accurately indeed!

In language-focused disciplines, these models appear extraordinarily capable, even excelling in exams. However, their reliability is not absolute and they are prone to errors, which presents a challenge for teachers who need to ensure factual accuracy in their work. Thankfully, however, the probability of an LLM providing a factual error is not random; understanding how it happens in response to prompts can help us determine when LLMs will be useful. Here are some reasons why its pattern-spotting behaviour goes wrong, with implications for teachers:

  • Training Data Inaccuracies: The models are trained on a vast array of books and web pages. If the model’s training data includes contested or inconsistent information, the output may be unreliable. This is particularly relevant for subjects like religious studies and ethics.
  • Contextual Limitations of the Prompt: Poorly framed prompts often yield incorrect or irrelevant responses. Effective communication with the AI model is therefore crucial and it takes time to learn how to do this well.
  • Model Oversimplification: While these models are sophisticated, they can sometimes oversimplify complex topics, particularly those that are highly specialised or nuanced.
  • Outdated Information: Current models were often trained on data up to 2021, posing challenges in subjects that require up-to-date information or where human knowledge is fast changing (e.g. global warming).
  • Ambiguity of the English Language: Variations in language use and word meaning across different cultural and technical contexts can lead to model confusion.
  • Hallucinations: In some instances, the model might generate text that seems nonsensical or factually incorrect, particularly where no next word has a high probability in its model (known as long-tail events). This is particularly likely when the model encounters unfamiliar or rarely discussed topics, such as a modern novel.

While these models are not infallible, their errors are reasonably predictable, allowing experienced users to gauge their reliability in specific contexts and make informed decisions about their applicability.

Will AI Assist or Replace Teacher Lesson Planning?

While there’s potential for AI to aid in lesson planning, replacing human teachers in this domain seems unlikely at present simply because human teachers both want and need to plan lessons. AI technologies do display proficiency in understanding subject matter and pedagogical techniques, often generating intriguing lesson ideas. However, according to Teacher Tapp, 72% of teachers craft their own lesson plans as a way to internalise and make sense of the subject matter they’re going to teach. This percentage is even higher among KS2 primary teachers and secondary English and humanities teachers, who often have to cover expansive knowledge domains.

This need—or desire—for teachers to personally engage in lesson planning poses a challenge for traditional external curriculum providers, such as Oak National or MAT central teams. Regardless of how impeccable their resources might be, teachers still require a hands-on understanding of the material. (Furthermore, teachers generally find lesson planning to be the most enjoyable aspect of their non-teaching duties.)

So, is there room for AI to streamline the lesson-planning process? Absolutely, and it’s already happening. Teachers are integrating AI tools into their planning, treating them as professional allies. They can seek advice on the challenges students may encounter or collaboratively brainstorm teaching strategies. AI excels at automating mundane tasks like writing instructions or generating sample texts, thereby freeing up teachers for more creative and nuanced work.

However, AI’s capacity to accelerate lesson planning isn’t confined to individual teachers; it will also enhance the efficiency of commercial curriculum development, making it possible for curriculum companies to develop resources in small topics and subjects where it was not previously economically viable. Given the co-dependence of teacher planning and teacher instruction, whether or not these AI-generated plans and resources make their way into classrooms will largely hinge on the evolving landscape of educational instruction, to which we now turn.

Disruption to Classroom Instruction?

Attempts to replace or supplement teacher instruction with paper-based or technological alternatives have a long history. Personally, I experienced self-directed learning in mathematics through SMP booklets as a child in the 1980s. Later, as a teacher, I supervised a Year 12 careers class where students worked through an online platform.

In recent years, numerous studies have evaluated the impact of replacing traditional teaching methods with computer-assisted instructional platforms. While I’m not particularly interested in research concerning the average effectiveness of these platforms — I expect their capabilities to rapidly evolve with the advancement of LLMs — I’d like to highlight a study that uncovers pertinent policy implications. Eric Taylor’s research examines the effect of computer-assisted platforms on the variation in teacher quality, as measured by improvements in class test scores. The study finds that the range in teacher effectiveness narrows; less effective teachers improve, while more effective teachers perform less well.

Three key takeaways for us emerge from such a study:

1. Optimistically, we could argue that this research provides a policy pathway to address future teacher shortages in subjects like maths and physics by employing non-specialist teachers who are supported by computer-assisted platforms.

2. On the ethical front, in the future one might argue that it’s our moral obligation to mandate the use of computer-assisted platforms for teachers who cannot demonstrate high efficacy. This could optimise student learning but also raises critical questions. How do we identify and subsequently treat these ‘second-tier’ teachers within the school system? What implications does this have for early-career teachers who are typically less effective in their first year? If required to use such platforms, when will they learn to teach without technological assistance?

3. Finally, while Taylor’s study indicates a group of teachers who perform better without the assistance of computer platforms, the size of this group may shrink as these platforms improve and as the teaching profession continues to face recruitment challenges. Do we then have an ethical duty to ensure all teachers use this technology, and if we do, what happens to the nature of the teaching profession?

For now, this is all speculation that is a little way off, but I think we can all glimpse into a future where there are serious moral and ethical debates about the rights of students and the freedoms of teachers as professionals.

One Last Attempt to Get Personalisation Right?

Traditional classroom teaching inherently limits how well a teacher can address the individual needs of each student, especially in large classes. In contrast, generative AI, implemented in classrooms or homework platforms, offers the potential to more effectively cater to individual needs.

For educators who have been teaching for decades, the word “personalisation” might elicit a sense of caution. This scepticism often arises from technologists’ unrealistic expectations of what personalisation can accomplish. For instance, catering to students’ learning styles or personal interests has been shown to neither enhance learning nor lead to a balanced curriculum.

However, personalisation in computer-assisted platforms usually means adjusting the pace of a student’s progress through a common curriculum based on their level of understanding. If a system can assess and remember what a student knows and can do, it essentially becomes more like a one-to-one tutor, providing explanations and practice activities when needed. 

Striking the right balance between whole-class teaching and individualised practice will depend on many factors, not least the nature of the subject’s knowledge domain. A case in point is Carnegie Learning’s blended learning maths platform, which integrates workbooks, teacher input, and computer-assisted learning. A study showed that students performed better when the platform’s algorithm dictated their individual practice episodes, rather than the teacher setting practice work based on the whole class’s current topic and level. This insight is particularly relevant to subjects with hierarchical knowledge domains, like maths and language, where traditional homework-setting approaches might be intuitive and yet suboptimal.

In such hierarchical subjects, meeting students’ needs as they arise is essential. In contrast, subjects like English literature and history offer different challenges for AI because they are less reliant on hierarchical structures where prior knowledge is critical and because they have curriculum diversity and goal ambiguity which make it more difficult for commercial companies to create learning platforms. In other words, AI platforms are likely to be harder to create and less useful to students. However, the ability of generative AI to rapidly create new resources could soon make these platforms more prevalent (if marking challenges can be addressed – see below). I imagine they will be used to assist with homework long before we see them used in the humanities and English classrooms.

In summary, while the term ‘personalisation’ carries historical baggage, modern AI offers a new and more effective approach, which might well warrant revisiting old assumptions about teaching and homework, especially in subjects where the mastery of learning matters.

What Type of Work Will AI Be Able to Mark?

Marking is often cited as one of the least enjoyable tasks for teachers. Yet our survey on Teacher Tapp revealed a surprising sentiment: 4-in-10 teachers said they would prefer to handle most of their marking, even if given the option to delegate it to another teacher. This leads us to question whether AI can support teachers in marking tasks whilst giving them sufficient information to teach responsively. My personal view is that our goal should be to create summaries of marking for teachers that allow them to respond to student needs without spending hours marking every evening (see image below).

How far are we from assisting teachers with marking? Daisy Christodoulou and Chris Wheadon at No More Marking have argued that Large Language Models (LLMs) are not yet ready to fully replace human judgment in marking tasks that are highly open-ended. Their company’s comparative judgement asks teachers to “pick the better script” without specifying what better is. Where there is such a large range of word patterns that humans might want to uphold as “good”, and yet the humans cannot articulate what language patterns to look for, it isn’t particularly surprising that the LLM does quite a poor job. So, I too am pessimistic about LLM-judging of highly open and creative tasks (though note they have slashed the workload of humans by providing an initial sort of scripts). Ultimately, if we have no language-based framework for articulating what a ‘good’ response looks like, then it is much more difficult (though not impossible) for us to train a language-based model to identify a ‘good’ response!

However, LLMs could excel in tasks that can be graded against a well-defined rubric. In subjects with structured knowledge domains, such as the sciences, teachers are already finding some success. The key is to write a ‘watertight’ rubric, one that leaves little room for dispute. Thankfully, LLMs are excellent partners to help teachers write good assessment rubrics or mark schemes! Furthermore, we can guard against the excesses of gaming rubrics by simply not revealing them to students.

For somewhat open assessment tasks where we want to award marks and accept a much broader range of responses (e.g. an expository essay), I am much less certain how far and how fast AI will get in helping us mark since it isn’t possible to write watertight rubrics (though we can write good ones). Human graders often rely on intuition, experience, and an understanding of contextual nuances that are difficult to codify into a rubric. In these areas, we are likely to supplement rubrics by asking the LLM to mimic the human behaviour of somebody who has already marked and reviewed a set of scripts. Furthermore, if the knowledge domain is not well codified (e.g. a modern fiction novel), we can try uploading large amounts of text to the AI model to inform or constrain how information is utilised.

The necessity for human involvement to check the marking of an LLM depends on both the probability of error and the importance of the assessment. In low-stakes settings where the margin for error is small, AI might in the future offer quick, albeit occasionally imperfect, feedback that may be more beneficial to students than waiting for human evaluation.

As we embark on this journey to understand large language models’ behaviour in specific assessment tasks, it is crucial not to fall into the trap of claiming AI can or cannot perform certain tasks without understanding how the models work and under what circumstances we can make them mimic our desired behaviour. I hope teachers will be transparent in sharing their experiences of marking using AI via blogs and presentations since commercial assessment companies are unlikely to publicly share their approaches.

Using AI to Motivate Students

Finally, I want to explore whether large language models can help motivate students. Online education works well for motivated adults, replacing so much real-life learning in our work and personal lives. However, children and teenagers differ in their developmental stages and are often required to study things that are not of their choosing! I used to believe technology could never replace human teachers due to the motivation issue, but my stance has softened a little. This shift is due to three aspects of AI bots powered by large language models.

Firstly, they can provide immediate feedback, acting as a behavioural ramp for students working through tasks. This tight feedback mechanism that we see in online games is highly motivating, especially for those with attention difficulties. Secondly, AI bots could potentially use persuasive language better than humans. Studies show AI doctors offer more useful advice and are rated as more empathetic than their human counterparts. AI doctors lack their own emotional needs – they don’t feel frustration or impatience! In education, AI can leverage emotive and persuasive language to coax students into studying as well as, or even better than, human teachers. Lastly, one-to-one human tutors often motivate students by finding their “headache” – the reason they need to learn something. AI tutors could similarly find entry points into difficult topics by linking them to a student’s interests in innovative ways.

These three techniques – tight behavioural ramp, persuasive language, and discovering the headache – are areas where humans excel in one-to-one situations but struggle in classroom settings. It will be interesting to see how computer-assisted learning platforms develop in motivating students to study. Will classrooms always be necessary? Maybe, but perhaps integrating AI bots within teachers’ classrooms could enhance motivation more effectively either the human teacher or the technology platform working alone.

Conclusion

I cannot see into the future any more than you can and I am sure there are things I’ve written here you will disagree with. That’s OK – we’ll soon find out who is right about the future! I am probably a little more optimistic about how quickly LLMs are going to assist teaching because I have seen how much they transform work in my day job at Teacher Tapp.

To believe LLMs will change teaching we don’t have to look into the future. We merely have to look at the present where over a third of teachers are already using them to help with their school work and where some innovative teachers are training them how to mark.

For those of you who teach a well-codified language-based subject that hasn’t been as well-served by technology as maths…? Well, strap yourself in because I suspect we are about to go on an interesting ride!

Leave a comment