Language Test Design: Approaches and Key Elements
Approaches to Language Test Construction
Direct vs. Indirect Testing
Indirect testing is often considered superior to direct testing in that its results are more generalizable.
Direct Testing
Direct testing requires the candidate to perform precisely the skill we wish to measure. For example, if we want to know how well candidates can write compositions, we have them write compositions. The tasks and texts used should be as authentic as possible. The very acts of speaking and writing provide us with direct information about a candidate’s ability. With listening and reading, however, it is necessary to get candidates not only to listen or read but also to demonstrate that they have done this successfully.
Direct testing has several attractions:
- Provided that we are clear about the specific abilities we want to assess, it allows us to create conditions that elicit the behavior on which to base our judgments.
- In the case of productive skills, the assessment and interpretation of students’ performance is quite straightforward.
- Since practice for the test involves practicing the skills we wish to foster, there is likely to be a helpful backwash effect.
Indirect Testing
Indirect testing attempts to measure the abilities that underlie the skills in which we are interested. For example, a method of testing pronunciation ability might involve a paper-and-pencil test where the candidate has to identify pairs of words that rhyme with each other. Perhaps the main appeal of indirect testing is that it seems to offer the possibility of testing a representative sample of a finite number of underlying abilities, which manifest in a potentially indefinitely large number of ways.
Discrete Point vs. Integrative Testing
Discrete Point Testing
Discrete point testing refers to the assessment of one element at a time, item by item. It might involve a series of items, each testing a particular grammatical structure.
Integrative Testing
Integrative testing requires the candidate to combine many language elements in the completion of a task. This might involve writing a composition, making notes while listening to a lecture, or taking a dictation.
Norm-Referenced vs. Criterion-Referenced Testing
Objective vs. Subjective Testing
This distinction relates to methods of scoring.
Objective Scoring
Objective scoring occurs when no judgment is required on the part of the scorer. This approach typically brings more reliability. An example is a multiple-choice test.
Subjective Scoring
Subjective scoring is necessary when the judgment of the scorer is required. There are varying degrees of subjectivity: the impressionistic scoring of a composition may be highly subjective, whereas the scoring of short answers in response to questions on a reading passage is less subjective. Generally, the less subjective the scoring is, the greater the agreement between two different scorers.
Key Elements in Communicative Language Test Construction
Test Specifications
A test specification is a detailed document, often for internal purposes only, and sometimes confidential to the examining body. In contrast, the syllabus is a public document, often much simplified, which indicates to test users what the test will contain.
Item Writing and Moderation
Item writers must begin their writing task with the test’s specifications. While this may seem obvious, it is surprising how many writers try to begin item writing by looking at past papers rather than at the specifications. This reliance on past papers is often due to the fact that many tests lack proper test specifications.
Moderation
Moderation involves an editing committee. No single person can produce a good test without advice. The committee reviews each item and the test as a whole, often taking each item as if they were the students.
Pretesting and Analysis
Pretesting (Trials)
No matter how well designed an examination may be, and however carefully it has been edited, its effectiveness cannot be fully known until it has been tried out on students.
Test Analysis
Test analysis involves:
- Correlations: Examining relationships between different versions or skills.
- Item Analysis: Evaluating individual items based on factors like facility value and discrimination index.
- Reliability Indexes: Measuring consistency through methods like retest, alternate versions, or split-half.
Training of Examiners
Training covers various marking approaches, such as holistic vs. analytic marking, and techniques like Rasch analysis (which considers both items and scorers).
Validation
Validity is crucial in testing and can be assessed through rational, empirical, or construct approaches.
Post-Test Reports
Post-test reports are provided to examiners, candidates, and teachers who prepare students for the test.