Language Test Design: Approaches and Key Elements

Approaches to Language Test Construction

Direct vs. Indirect Testing

Indirect testing is often considered superior to direct testing in that its results are more generalizable.

Direct Testing

Direct testing requires the candidate to perform precisely the skill we wish to measure. For example, if we want to know how well candidates can write compositions, we have them write compositions. The tasks and texts used should be as authentic as possible. The very acts of speaking and writing provide us with direct information about a candidate’s ability. With listening and reading, however, it is necessary to get candidates not only to listen or read but also to demonstrate that they have done this successfully.

Direct testing has several attractions:

  1. Provided that we are clear about the specific abilities we want to assess, it allows us to create conditions that elicit the behavior on which to base our judgments.
  2. In the case of productive skills, the assessment and interpretation of students’ performance is quite straightforward.
  3. Since practice for the test involves practicing the skills we wish to foster, there is likely to be a helpful backwash effect.

Indirect Testing

Indirect testing attempts to measure the abilities that underlie the skills in which we are interested. For example, a method of testing pronunciation ability might involve a paper-and-pencil test where the candidate has to identify pairs of words that rhyme with each other. Perhaps the main appeal of indirect testing is that it seems to offer the possibility of testing a representative sample of a finite number of underlying abilities, which manifest in a potentially indefinitely large number of ways.

Discrete Point vs. Integrative Testing

Discrete Point Testing

Discrete point testing refers to the assessment of one element at a time, item by item. It might involve a series of items, each testing a particular grammatical structure.

Integrative Testing

Integrative testing requires the candidate to combine many language elements in the completion of a task. This might involve writing a composition, making notes while listening to a lecture, or taking a dictation.

Norm-Referenced vs. Criterion-Referenced Testing

Objective vs. Subjective Testing

This distinction relates to methods of scoring.

Objective Scoring

Objective scoring occurs when no judgment is required on the part of the scorer. This approach typically brings more reliability. An example is a multiple-choice test.

Subjective Scoring

Subjective scoring is necessary when the judgment of the scorer is required. There are varying degrees of subjectivity: the impressionistic scoring of a composition may be highly subjective, whereas the scoring of short answers in response to questions on a reading passage is less subjective. Generally, the less subjective the scoring is, the greater the agreement between two different scorers.

Key Elements in Communicative Language Test Construction

Test Specifications

A test specification is a detailed document, often for internal purposes only, and sometimes confidential to the examining body. In contrast, the syllabus is a public document, often much simplified, which indicates to test users what the test will contain.

Item Writing and Moderation

Item writers must begin their writing task with the test’s specifications. While this may seem obvious, it is surprising how many writers try to begin item writing by looking at past papers rather than at the specifications. This reliance on past papers is often due to the fact that many tests lack proper test specifications.

Moderation

Moderation involves an editing committee. No single person can produce a good test without advice. The committee reviews each item and the test as a whole, often taking each item as if they were the students.

Pretesting and Analysis

Pretesting (Trials)

No matter how well designed an examination may be, and however carefully it has been edited, its effectiveness cannot be fully known until it has been tried out on students.

Test Analysis

Test analysis involves:

  • Correlations: Examining relationships between different versions or skills.
  • Item Analysis: Evaluating individual items based on factors like facility value and discrimination index.
  • Reliability Indexes: Measuring consistency through methods like retest, alternate versions, or split-half.

Training of Examiners

Training covers various marking approaches, such as holistic vs. analytic marking, and techniques like Rasch analysis (which considers both items and scorers).

Validation

Validity is crucial in testing and can be assessed through rational, empirical, or construct approaches.

Post-Test Reports

Post-test reports are provided to examiners, candidates, and teachers who prepare students for the test.