Learning from TAGs: Assessment design

With only two days left until the TAG deadline, Mark Enser turns his attention to the future and how this recent experience should help the sector design effective and efficient assessments in the future…

We use assessments for two main purposes. We use them formatively to find out what pupils know, understand, and can do, in the hope that we can then use this information to support them further. We also use assessments summatively to reach a judgement about what pupils know, understand, and can do in comparison to their peers, either within a school or nationally, so that this information can be reported and used for qualifications or internal data tracking.

Creating either formative or summative assessments is difficult but creating summative assessments is really difficult. This is something that has really been revealed over the last year with the process of creating Teacher Assessed Grades (TAGs). Although this has been a gruelling and frustrating processes, it has highlighted a few important lessons about summative assessment design that we can use in the future.

1. You don’t know how hard an assessment is

By this time last year, most people working in schools were already aware that not only would exams in 2020 have to change but those in 2021 as well; the experience of pupils across the country was too wildly different for them to all be fairly assessed against each other. The first proposal to change this year’s exam involved removing content from the papers of many subjects. The problem was that people had all covered different parts of the course – a school that had covered material that was now removed was going to leave their pupils disadvantaged. This led to calls for optionality to be introduced into the papers with pupils being given the choice over which topics to answer. On the surface this sounds sensibly, but in practice it doesn’t work as not all parts of an exam paper are equally difficult. Some parts test different Assessment Objectives (AOs) and some topics are simply more complicated than others.

Takeaway 1

This lesson can be useful to us in schools as it reminds us that we can’t look for progress between assessments as those assessments will be of varying difficulty. If professional exam boards can’t ensure each assessment is of a predictable level of difficulty we certainly aren’t doing this ourselves. If a pupil gets 60% on their first assessment and 57% on their next it doesn’t mean they have “gone backwards” – it might mean the assessment was just more difficult.

2. Don’t assess activity

When the arrangements for TAGs were announced it seemed as though schools could use almost anything as evidence for how they reached their judgement, including things like classwork. The problem is that the work pupils do in class are activities designed to help them learn, not activities designed to demonstrate learning. When we create learning activities we tend to provide the materials they need to answer questions, beyond that which is in their heads. They are using these materials to generate learning as they think about them and answer questions. When we want them to demonstrate learning we remove those additional materials, or at least we should. I think we are sometimes so worried about pupils “failing” that we make it impossible to really assess what they can do without support.

Takeaway 2

Make sure assessments really differentiate between those who can do what you want them to be able to do and those who can’t. Otherwise, it isn’t working as an assessment.

3. Beware construct irrelevance

A big problem with many assessments is the inclusion of what is termed construct irrelevance. The construct in the assessment is the thing that you actually want to assess, the irrelevance is what you can end up assessing. For example, a maths assessment might include a complex written explanation of the problem that leads to you not assessing their mathematical ability but their literacy. This can be a particular problem when trying to create what feel like more “authentic” assessments, ones that reveal real world tasks, because these kinds of tasks involve a whole plethora of skills. This is fine, as long as you are comfortable with assessing all of these different skills when reaching a judgement.

Takeaway 3

Create a list of what you think pupils have been taught to do in your curriculum – assess that.

4. Curriculum and assessment talk to each other

Construct irrelevance also teaches us another lesson, that there should be movement between curriculum and assessment. It could be that what at first seems to be construct irrelevance, such as the ability to read and comprehend the task instruction, is something you feel should be assessed. In which case it should be something that you actually teach. This is something that TAGs have really highlighted. Being emersed in the exam papers and their mark schemes have shown that there are areas of our curriculum that should be strengthened as we are not always teaching those things that seem important enough to assess.

Takeaway 4

Look back at the list you created showing what you think people have been taught in your curriculum – have they actually been taught this? Are there things you wish you had taught them? Adjust your curriculum accordingly.

5. Avoid bias

One of the most uncomfortable truths revealed by TAGs is the extent of unconscious teacher bias in grading. No one wants to think they are affected by gender, ethnicity or background when marking a pupil’s work, but that is the problem with unconscious bias. Ofqual have presented the research into this and it is fairly grim reading. Whilst being aware of the problem is a great start, the only way round it is to ensure that people don’t know whose papers they are marking. This has an added benefit of speeding the marking process up. When you mark the assessment of a pupil you know you inevitably find yourself thinking about the answers in the context of that pupil (which creates the opportunity for bias in the first place). This is great when doing assessment for formative purposes, at it gives you personalised information that you can use but is detrimental to speedy summative marking.

Takeaway 5

Swap papers and/or ask pupils to put a randomised number on their paper instead of their name. Mark papers for summative purposes differently than you would formatively. Don’t spend time making corrections and annotating.

6. Be honest

One of the biggest problems with TAGs is that we need to try and create a nationally standardised grade from the information we have from our own cohort. This is impossible but weirdly is something many schools try to do all the time anyway. They do an assessment and then award it some sort of grade, based on grade boundaries for national assessments sat in very different circumstances, or worse yet try to make an inference about what the score on the assessment suggests pupils will get in the future. TAGs have helped to show us just how meaningless this all is. If we want to be honest with our summative assessments, we can be. We can report what we actually know from the assessment. That the average score was X and pupil Y got Z. Or that Pupil Y got Z which puts them in the X percentile.

Takeaway 6

A well-designed assessment can tell you whether pupils are learning what you wanted them to learn and how this compares to other people who did the same assessment. Making wider inferences about progress is likely to be flawed. So report what you know.

Mark Enser is head of geography and research lead at Heathfield Community College. He is also a TES columnist and author. He tweets at @EnserMark