Corpus-Assisted Discourse Studies: Integrating Analytical Tools
In Corpus-Assisted Discourse Studies (CADS), corpus tools are essential for observing linguistic patterns that are difficult to detect through manual reading. However, as Baker insists, analysis should not be reduced to mere word counting. A robust discourse study combines several methods to transition from quantitative data to qualitative interpretation.
1. Frequency: The Starting Point
Frequency lists identify the most common words, lemmas, or clusters, indicating the lexical or thematic focus of a corpus. While frequency highlights potential areas of interest, it does not explain meaning independently, as it may be influenced by genre conventions or repetition within a single document.
2. Dispersion and Distribution
These tools refine the analysis by providing context:
- Dispersion: Determines if a word is spread throughout the corpus or concentrated in a specific section, preventing false claims of representativeness.
- Distribution: Identifies which sub-corpora (e.g., newspapers, spoken vs. written, specific time periods) contain the pattern.
3. Concordance Analysis: Meaning in Context
Using the Key Word in Context (KWIC) format, researchers examine the node word alongside its surrounding text. This allows for the observation of:
- Adjectives and verbs associated with the term.
- The role of the term (agent vs. patient).
- Metaphorical, quantitative, or evaluative associations.
4. Collocation and Semantic Preference
Collocates reveal recurrent associations. By analyzing these, researchers can identify:
- Discourse Prosody: Negative or positive connotations acquired through proximity to other terms.
- Semantic Preference: Associations with words from the same semantic field.
Note: Researchers must justify their statistical measures (e.g., MI, log-likelihood, or LogDice) and verify associations through concordances.
5. Keyness: Identifying Distinctive Features
Keyness identifies statistically salient terms by comparing a target corpus against a reference corpus. This is vital for highlighting differences between genres, discourse communities, or ideological positions.
Conclusion: The CADS Approach
The strength of CADS lies in the synergy of these tools:
- Frequency directs attention.
- Dispersion and Distribution ensure representativeness.
- Keyness highlights distinctiveness.
- Concordances and Collocates explain meaning in use.
A high-quality analysis does not simply produce tables; it explains how linguistic patterns contribute to the construction of socially meaningful discourses.
