Nottingham Trent University
Prospective Students    International Students    Postgraduate & Professional    Research    News & Events    Contacts   
 

Marking as a sequence: keeping consistency

Marking in this section is used to refer to marking a series of scripts by an individual. Maintaining consistency while marking across a number of essays is known to be challenging. Essays are complex to mark as they are responses to an open question and that poses challenges to reliability (Brown, 2001).

The challenge of maintaining consistency described by NTU colleagues, as elsewhere, stems from the complexity in judging variable responses from students against an abstract set of descriptors or achievement criteria. This form of marking, criterion-referenced marking, is currently the most common assessment framework in HE. It is best understood as an alternative to norm-referenced assessment in which students’ performances are marked against each other (Miller, Imrie and Cox, 1998).

NTU colleagues expressed awareness about the difficulty of keeping consistency when marking diverse student responses. Many factors impinge on marking and keeping consistency is a known challenge (Hughes, Keeling and Tuck, 1980; Spear, 1996). In response to this challenge, several strategies have been identified. These are known as self-monitoring strategies (Cumming, Kantor and Powers, 2002). The monitoring strategies that groups engage in are discussed in moderation of marking.

Interviews with NTU colleagues served to elicit some of the self-monitoring strategies they applied while marking a series of scripts (self-monitoring can take place before, during and after marking). There is additional effort invested, difficult to quantify, in achieving consistency and self-monitoring. The strategies reported below are not exclusive, and different colleagues reported using one or more of the following:

Pre-marking

  • start marking and then review: start reading and marking some scripts and review marking criteria
  • sorting scripts by bands of achievement: pre-sorting scripts according to a first impression on quality of work

 During marking

  • marking in sets: read, construct feedback for a set of scripts (and not one individually)
  • comparing to other students’ work during marking: compare students’ work to check performance
  • marking across the range of marks: check the distribution of marks
  • use marking criteria

Post-marking comparisons and checks: similar to the checks above but at the end of marking

  • No self-moderation: no checks apart from marking criteria.

Reflecting on approaches to self-monitoring
In this section the approaches described above are discussed drawing on key messages from the literature.

Standards are more important than criteria
Strictly speaking, in criterion-referenced assessment (i.e. marking against criteria rather than marking against performance of other students) the requirement is that the only check on consistency is made using the marking criteria. However the strategies described by practitioners (at NTU and elsewhere) reveal one of the major known flaws of criterion-referenced assessment, namely the difficulty of articulating criteria of achievement in relation to knowledge and understanding (Sadler 1989; Miller, Imrie and Cox, 1998; Dunn, Parry and Morgan, 2002). Usually, descriptors of quality and achievement that are sufficiently specific are difficult to write (Carlson et al., 2002). Additionally, criteria generally relate to what is measured whilst standards relate to how these criteria will be measured. This leaves a gap in the application of criteria and relies to a large extent on the ability of practitioners to exercise professional judgement over standards of quality. Most of the strategies described above reveal a search for standards and terms of comparison.

Sadler (1989) argues that the root of the problem is that the terms criteria and standards are used interchangeably some times and that in practice the application of criteria varies. Self-monitoring strategies are compensatory as they seek to fill in the notion of a clear and identifiable benchmark (a standard in performance) prior to making a judgement. Many authors agree that a standards-referenced assessment would enhance reliability (Sadler, 1987; Carlson et al., 2000; Dunn, Parry and Morgan, 2002).

Effectiveness and practical aspects of self-monitoring
The fundamental rationale for using many of the self-monitoring strategies described above is that they are necessary and effective. However, this assumption needs to be given due consideration.

A common self-monitoring strategy (at NTU and elsewhere) consists in sampling a few scripts to gauge the standard of student performance. By definition, establishing the standard of performance would require sampling of a large number of scripts from many different years. Deriving the standard of performance from a small sample in a given year is flawed.
Another self-monitoring strategy is “marking across the range” of marks. This strategy was reported also when marking small samples (e.g. a sample of up to 30 students). In principle, this notion is contrary to the criterion-referenced assessment framework. Moreover, the assumption is flawed as when marking a small sample, statistically speaking, it is possible that marks may be skewed (Sadler, 1987).

The self-monitoring strategies imply checking on the detail of students’ work and this therefore entails additional effort. In judging the effectiveness of this effort for enhancing the reliability of judgements, research suggests that spending more time marking in order to make a judgement does not enhance reliability and, further, probably results in lower grades being awarded. Additionally, more time spent marking may lead to a focus on less relevant aspects of the work that might not contribute to the overall grade (McColly, 1970).

What alternatives might work to enhance intra-marker consistency?
The literature suggests that particular self-monitoring strategies might be most effective at the beginning of marking (Brown, 2001). The use of a clear set of exemplars of different bands of achievement, or a set of specific descriptors of different standards of achievement might support the decision-making of lecturers (Sadler, 1989; Brown, 2001). Ahmed and Pollitt (2011) provide examples of marking schemes that specify weightings for different requirements.

Another strategy which has been identified as enhancing between and within rater reliability is the use of analytic marking (Barkaoui, 2011). Analytic marking contrasts with holistic marking and consists of awarding a mark for each different criterion; then the final mark is derived from different marks for different elements. This might support inter and intra marker consistency when dealing with essay format answers and may present an alternative tool for markers to minimise the reported loops of checking and wading through scripts.

You might also be interested in:

References
AHMED, A. and POLLITT, A., 2011. Improving marking quality through a taxonomy of mark schemes. Assessment in Education: Principles, Policy & Practice. 18 (3), pp. 259-278.

BARKAOUI, K., 2011. Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy & Practice, 18(3), pp. 279-293.

BROWN, G., 2001. Assessment: A guide for lecturers. LTSN Assessment series. Available at: http://www.bioscience.heacademy.ac.uk/ftp/Resources/gc/assess03Lecturers.pdf[Accessed: 15th of September 2011]

CARLSON, T., MACDONALD, D., GORELY, T., HANRAHAN, S. and Burgess-Limerick, R., 2000. Implementing criterion-referenced assessment within a multi-disciplinary university department. Higher Education Research & Development, 19(1), pp. 104-116

CUMMING, A., KANTOR, R. AND POWERS, D. E., 2002. Decision Making while Rating ESL/EFL Writing Tasks: A Descriptive Framework. The Modern Language Journal, 86, pp. 67–96.

DUNN, L, PARRY, S. & MORGAN, C., 2002. Seeking quality in criterion referenced assessment. Proceedings of the Learning Communities and Assessment Cultures Conference, University of Northumbria, 28-30 August. Available at: http://www.leeds.ac.uk/educol/documents/00002257.htm [Accessed 4th of October 2011].

HUGHES, D. C., KEELING, B., & TUCK, B.F., 1980. The influence of context position and scoring method on essay scoring. Journal of Educational Measurement, 17, pp. 131-135.

McCOLLY, W., 1970. What does educational research say about the judging of writing ability?. Journal of Educational Research, 64, pp. 147-156.

MEADOWS, M. AND BILLINGTON, L., 2005. A Review of the Literature on Marking Reliability. AQA Research Paper. RPA_05_MM_RP_05.

MILLER, A. H., IMRIE, B.W. & COX, K., 1998. Student Assessment in Higher Education: A Handbook for Assessing Performance. London, Kogan Page.

SADLER, D.R, 1987. Specifying and promulgating achievement standards. Oxford Review of Education, 3(2), pp. 191-207.

SADLER, D., 1989. Formative assessment and the design of instructional systems. Instructional Science, 18, pp. 119–144.

SPEAR, M., 1996. The influence of halo effects upon teachers' assessments of written work. Research in Education, 56, pp. 85-87.

 

CADQ
Nottingham Trent University
Dryden Centre 202
Dryden Street
Nottingham
NG1 4FZ

Marking and moderating text-based coursework

Marking individual scripts

Moderation of marking

Statements | Contacts Nottingham Trent University, Burton Street, Nottingham, NG1 4BU
Tel: +44 (0)115 941 8418 Email