Psychometrics: Measurement in Psychology
Psychometrics is the discipline dealing with the measurement of psychological characteristics. It aims to determine the reliability and validity of psychological tests, using statistical procedures to ascertain whether a test accurately measures a specific psychological variable. Psychometrics is divided into scaling theory and test & measurement theory. Francis Galton’s work on eugenics, aiming to improve humanity’s biological heritage through natural selection, sparked interest in statistical measurement within psychology, particularly in sensorimotor psychometric tests.
Several experts have defined psychometrics:
- Yela (1968): Psychometrics encompasses all measurements in the psychological field, developing through two relatively independent branches: psychophysical methods and test methods.
- Nunnally (1973): Psychometrics involves developing and using measurement techniques across all aspects of psychology.
- Rivas (1979): There’s general agreement that psychometrics measures observable behaviors and psychological nature.
- Macia (1982): Psychometrics is a discipline within psychological science with a dual function: theoretical (studying measurement and its possibilities in psychology) and practical (applying this knowledge to individual psychological issues, including methods and instruments).
- Muñiz (1992): Psychometrics is a set of methods, techniques, and theories involved in measuring psychological variables, emphasizing metric properties regardless of the field of application or instruments used.
- Barbero (1994): A methodological discipline within psychology, primarily tasked with quantifying behavioral manifestations, including theoretical (potential and criteria) and practical (how and with what) implications.
- Martinez Arias (1995): Psychometrics encompasses all formal models enabling measurement of psychological variables, focusing on the conditions for proper measurement processes in psychology.
Psychometrics has two main aspects:
- Theoretical: Theories describing, categorizing, and assessing the usefulness and accuracy of psychological measurements, while also exploring new methods and mathematical models for improved instruments.
- Practical: Providing adequate tools for accurate measurement and their applications, using formal mathematical language.
Psychometrics and Psychology: Psychometrics supports psychology in any field requiring measurement. It contributes to the foundation, development, and validation of constructs and variables within psychological theories, playing both instrumental and conceptual roles. It drives the development of psychometric theory and methods.
Measurement Theory: Explores the necessary and sufficient conditions for measurement, relationships among numbers, properties of measured objects, and levels of measurement. It describes, categorizes, and evaluates the quality of measures to enhance their usefulness and accuracy, proposing new methods for developing better instruments. Aftan (1988) explains how and when numbers can represent information about attribute quantities.
Test Theory: Examines factors influencing observed test scores, described by its assumptions. It develops mathematical models for analyzing subject responses, focusing on test construction and validation. It provides models for test scores and data analysis, yielding:
- Scalar values of subjects: Estimating the level of the measured characteristic in subjects.
- Scalar values of items: Estimating item parameters.
A key problem is relating the subject’s level in the unobserved variable to their observed test score. Test theory is divided into classical test theory and item response theory.
Classical Test Theory: The classical linear model of test theory, postulated in 1904 by Charles Spearman, was the dominant model until advancements like generalizability theory (Cronbach et al., 1950s) and item response theory (Lord and Rasch, 1960s).
Scaling: Methods for developing psychophysiological and psychological scales for constructs that cannot be directly observed. It assumes psychological objects (stimuli, subjects, responses) can be ordered along a continuum. It uses mathematical techniques to determine numbers representing different amounts of the measured property. Bunge defines a construct as a non-observational concept that cannot be directly proven or manipulated, but is inferred from behavior. Psychophysics is the historical origin of scaling methods.
Construct: A hypothetical entity difficult to define in scientific theory, known to exist but with a challenging or controversial definition.
Measurement in Psychology: Must be indirect, relative, and probabilistic. Lord and Novick (1968) emphasized methodological rigor: the measured object must be observable (directly or indirectly), show variability, and require precise instruments.
Classical Theory of Objects: Assumes a true value of the measured attribute (its magnitude). Measurement assesses this magnitude or quantity, answering: How much of the attribute does an object possess? Quantitative measurement requires concatenation or combination of objects, adding the numbers assigned to them. Fechner defined measuring a quantity as determining how much of a unit of the same class it contains. Hölder’s axioms describe conditions for measurable magnitude, forming an extensive (ordinal and additive) structure.
Psychology lacks units that can be concatenated for continuous measurement. Not all numerical operations correspond to operations applicable to behavioral phenomena. Campbell noted direct measurement is impractical in psychology. Solutions include operational theory and Stevens’ representational theory.
Operational Theory: Focuses on developing operations producing consistent numbers to discover quantitative relationships. Torgerson (1958) and Dawes (1972) used terms like mandated measures, measures by definition, indicator measures, or index measures. Red tiled (2001) noted constructs lack unique indicator sets, requiring researchers to choose empirical indicators. Evaluation focuses on the operation’s usability and functionality.
Representational Theory of Measurement: Numerical relationships should represent relationships between object properties.
Measurement Scales and Stevens’ Significance: Determine appropriate statistical analyses. Lord (1953), Anderson (1961), and Prytulak (1975) argued measurement and statistics are separate domains. Fraser (1980) noted that while empirical significance doesn’t affect the validity of numerical manipulations, it does affect conclusions drawn from those manipulations.
Types of Measurement: Fundamental (length, weight), derived (stellar), by indicators (social sciences). Variables measured by indices have empirical correlations and are observable. Latent variables are theoretical constructs without direct empirical counterparts.
Steps for measurement by indices:
- Conceptual Definition: Nominal definition, theoretical readings, personal reflection, and a theoretically defensible proposal with clear boundaries. Define dimensions/sub-scales, providing rationale.
- Definition of Indicators: Operational definition, proposed indicators (observational, written/oral responses, external observations, records), selection and justification (easy to register, related to the concept, theoretical and empirical validity, sensitivity to concept variation).
- Creating Indices and Sub-Indices: Mathematical formulas combining indicators into sub-scale scores and sub-indices into total scale scores.
Criticisms of Measurement by Indices: Unclear unit of measurement, unclear interval level properties, unclear respect for object relationships, unclear score meaning (multiple ways to obtain it), arbitrariness, combining incomparable objects.
Defenders of Measurement by Indices: Statistically assessable reliability and validity, empirical support, and ongoing development of models to give scores more meaning.