|
|
|
Marking as a sequence: keeping consistency
Marking in this section is used to refer to marking a series of scripts by an individual. Maintaining consistency while marking
across a number of essays is known to be challenging. Essays are complex to mark as they are responses to an open question
and that poses challenges to reliability (Brown, 2001).
The challenge of maintaining consistency described by NTU colleagues, as elsewhere, stems from the complexity in judging variable
responses from students against an abstract set of descriptors or achievement criteria. This form of marking, criterion-referenced
marking, is currently the most common assessment framework in HE. It is best understood as an alternative to norm-referenced
assessment in which students’ performances are marked against each other (Miller, Imrie and Cox, 1998).
NTU colleagues expressed awareness about the difficulty of keeping consistency when marking diverse student responses. Many
factors impinge on marking and keeping consistency is a known challenge (Hughes, Keeling and Tuck, 1980; Spear, 1996). In
response to this challenge, several strategies have been identified. These are known as self-monitoring strategies (Cumming,
Kantor and Powers, 2002). The monitoring strategies that groups engage in are discussed in moderation of marking.
Interviews with NTU colleagues served to elicit some of the self-monitoring strategies they applied while marking a series
of scripts (self-monitoring can take place before, during and after marking). There is additional effort invested, difficult
to quantify, in achieving consistency and self-monitoring. The strategies reported below are not exclusive, and different
colleagues reported using one or more of the following:
Pre-marking
- start marking and then review: start reading and marking some scripts and review marking criteria
- sorting scripts by bands of achievement: pre-sorting scripts according to a first impression on quality of work
During marking
- marking in sets: read, construct feedback for a set of scripts (and not one individually)
- comparing to other students’ work during marking: compare students’ work to check performance
- marking across the range of marks: check the distribution of marks
- use marking criteria
Post-marking comparisons and checks: similar to the checks above but at the end of marking
- No self-moderation: no checks apart from marking criteria.
Reflecting on approaches to self-monitoring In this section the approaches described above are discussed drawing on key messages from the literature.
Standards are more important than criteria Strictly speaking, in criterion-referenced assessment (i.e. marking against criteria rather than marking against performance
of other students) the requirement is that the only check on consistency is made using the marking criteria. However the strategies
described by practitioners (at NTU and elsewhere) reveal one of the major known flaws of criterion-referenced assessment,
namely the difficulty of articulating criteria of achievement in relation to knowledge and understanding (Sadler 1989; Miller,
Imrie and Cox, 1998; Dunn, Parry and Morgan, 2002). Usually, descriptors of quality and achievement that are sufficiently
specific are difficult to write (Carlson et al., 2002). Additionally, criteria generally relate to what is measured whilst
standards relate to how these criteria will be measured. This leaves a gap in the application of criteria and relies to a
large extent on the ability of practitioners to exercise professional judgement over standards of quality. Most of the strategies
described above reveal a search for standards and terms of comparison.
Sadler (1989) argues that the root of the problem is that the terms criteria and standards are used interchangeably some times
and that in practice the application of criteria varies. Self-monitoring strategies are compensatory as they seek to fill
in the notion of a clear and identifiable benchmark (a standard in performance) prior to making a judgement. Many authors
agree that a standards-referenced assessment would enhance reliability (Sadler, 1987; Carlson et al., 2000; Dunn, Parry and
Morgan, 2002).
Effectiveness and practical aspects of self-monitoring The fundamental rationale for using many of the self-monitoring strategies described above is that they are necessary and
effective. However, this assumption needs to be given due consideration.
A common self-monitoring strategy (at NTU and elsewhere) consists in sampling a few scripts to gauge the standard of student
performance. By definition, establishing the standard of performance would require sampling of a large number of scripts from
many different years. Deriving the standard of performance from a small sample in a given year is flawed. Another self-monitoring strategy is “marking across the range” of marks. This strategy was reported also when marking small
samples (e.g. a sample of up to 30 students). In principle, this notion is contrary to the criterion-referenced assessment
framework. Moreover, the assumption is flawed as when marking a small sample, statistically speaking, it is possible that
marks may be skewed (Sadler, 1987).
The self-monitoring strategies imply checking on the detail of students’ work and this therefore entails additional effort.
In judging the effectiveness of this effort for enhancing the reliability of judgements, research suggests that spending more
time marking in order to make a judgement does not enhance reliability and, further, probably results in lower grades being
awarded. Additionally, more time spent marking may lead to a focus on less relevant aspects of the work that might not contribute
to the overall grade (McColly, 1970).
What alternatives might work to enhance intra-marker consistency? The literature suggests that particular self-monitoring strategies might be most effective at the beginning of marking (Brown,
2001). The use of a clear set of exemplars of different bands of achievement, or a set of specific descriptors of different
standards of achievement might support the decision-making of lecturers (Sadler, 1989; Brown, 2001). Ahmed and Pollitt (2011)
provide examples of marking schemes that specify weightings for different requirements.
Another strategy which has been identified as enhancing between and within rater reliability is the use of analytic marking
(Barkaoui, 2011). Analytic marking contrasts with holistic marking and consists of awarding a mark for each different criterion;
then the final mark is derived from different marks for different elements. This might support inter and intra marker consistency
when dealing with essay format answers and may present an alternative tool for markers to minimise the reported loops of checking
and wading through scripts.
You might also be interested in:
References AHMED, A. and POLLITT, A., 2011. Improving marking quality through a taxonomy of mark schemes. Assessment in Education: Principles,
Policy & Practice. 18 (3), pp. 259-278.
BARKAOUI, K., 2011. Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in
Education: Principles, Policy & Practice, 18(3), pp. 279-293.
BROWN, G., 2001. Assessment: A guide for lecturers. LTSN Assessment series. Available at: http://www.bioscience.heacademy.ac.uk/ftp/Resources/gc/assess03Lecturers.pdf[Accessed: 15th of September 2011]
CARLSON, T., MACDONALD, D., GORELY, T., HANRAHAN, S. and Burgess-Limerick, R., 2000. Implementing criterion-referenced assessment
within a multi-disciplinary university department. Higher Education Research & Development, 19(1), pp. 104-116
CUMMING, A., KANTOR, R. AND POWERS, D. E., 2002. Decision Making while Rating ESL/EFL Writing Tasks: A Descriptive Framework.
The Modern Language Journal, 86, pp. 67–96.
DUNN, L, PARRY, S. & MORGAN, C., 2002. Seeking quality in criterion referenced assessment. Proceedings of the Learning Communities
and Assessment Cultures Conference, University of Northumbria, 28-30 August. Available at: http://www.leeds.ac.uk/educol/documents/00002257.htm [Accessed 4th of October 2011].
HUGHES, D. C., KEELING, B., & TUCK, B.F., 1980. The influence of context position and scoring method on essay scoring. Journal
of Educational Measurement, 17, pp. 131-135.
McCOLLY, W., 1970. What does educational research say about the judging of writing ability?. Journal of Educational Research,
64, pp. 147-156.
MEADOWS, M. AND BILLINGTON, L., 2005. A Review of the Literature on Marking Reliability. AQA Research Paper. RPA_05_MM_RP_05.
MILLER, A. H., IMRIE, B.W. & COX, K., 1998. Student Assessment in Higher Education: A Handbook for Assessing Performance.
London, Kogan Page.
SADLER, D.R, 1987. Specifying and promulgating achievement standards. Oxford Review of Education, 3(2), pp. 191-207.
SADLER, D., 1989. Formative assessment and the design of instructional systems. Instructional Science, 18, pp. 119–144.
SPEAR, M., 1996. The influence of halo effects upon teachers' assessments of written work. Research in Education, 56, pp.
85-87.
|
|
|
CADQ Nottingham Trent University Dryden Centre 202 Dryden Street Nottingham NG1 4FZ
|
|