Principles of Effective Data Visualization
Foundations of Visualization
Goals
Leverage visuals to understand and communicate data. Goals can be broken down into four main categories:
- Record Information
- Analyze Data to Support Reasoning (Example: the analysis of O-ring data for the Challenger disaster.)
- Confirm Hypothesis (Example: exact contaminated water pump location from Dr. John Snow’s 1854 cholera map of London.)
- Communicate Ideas to Others
Why Visualization Works
It offloads cognitive work to the perceptual system; cognition and memory are limited, but perception is fast.
The Human Visual System and Perception
Perception
The brain interprets signals from the eyes.
- Rods and Cones: The retina contains approximately 120 million rods (low-light vision) and 5–6 million cones (color and detailed vision).
- Foveation and Saccades: Vision is sharpest at the center of our gaze (the fovea). To build a complete picture, eyes make rapid movements (saccades) to new points of focus (fixations). We only see a small area at any given moment; the brain fills in the pieces. Crisp, uncluttered visuals are easier to interpret.
Edge Detection
The visual system is wired to detect edges and changes in luminance, not to perceive absolute brightness values. This is due to the receptive fields of ganglion cells in the retina, which increase or decrease their firing rate at edges. This leads to several illusions:
- Hermann Grid Effect: Seeing gray dots at intersections in a white grid on a black background.
- Cornsweet Illusion: Two same shades of gray appear different due to the edge between them.
- Mach Bands: Exaggerated contrast at the boundary between slightly different shades.
Takeaway: Make shapes stand out by maximizing contrast with the background.
Weber’s Law and Context Effects
We judge visual properties based on relative, not absolute, differences. Appearance is heavily influenced by surroundings.
Pre-attentive Processing
Certain features “pop out” and can be detected quickly (<200 ms) without effort. Channels include color, shape, size, and orientation.
- Single Channel: Detecting a red dot among blue dots.
- Conjunction: Detecting a red square among red circles and blue squares takes more time.
Gestalt Principles
Rules on how our minds group visual elements into meaningful wholes:
- Similarity: Objects that look alike.
- Proximity: Objects that are close to each other.
- Connection: Objects that are visually connected.
- Continuity: Smooth lines or patterns.
- Closure: We tend to see complete figures even when parts are missing.
- Figure/Ground: Perceive elements either as foreground or background.
- Common Fate: Objects that move in the same direction.
Color in Visualization
Color Models
Represent color numerically:
- Additive (RGB): Used for light-emitting displays.
- Subtractive (CMYK): Used for print (Cyan, Magenta, Yellow, Black—for true black).
- HSV/HSL: Based on Hue (“the color”), Saturation (intensity), and Value/Luminance (brightness/darkness).
- CIE LAB/LUV: Uniform color space where numerical distances correspond more closely to perceived color differences.
Color Deficiencies
About 8% of men have some form of color blindness.
- Types: Monochromacy (total color blindness), Dichromacy (missing one type of cone, e.g., protanopia for red), Anomalous Trichromacy (a shift in one cone’s sensitivity, e.g., deuteranomaly, the most common form).
- Design Guidelines: Do not rely on color alone. Vary luminance, saturation, and hue. Use a monochrome-friendly palette and add redundant cues (like shape or text labels). The Vischeck tool simulates color blindness.
Color Perception Guidelines
- Luminance vs. Chrominance: We are much more sensitive to changes in brightness (L) than changes in color (C). Use luminance to encode fine details.
- Relativity: Color perception is dependent on context (simultaneous contrast) and the size of the mark.
- Saturation and Size: Use bright, saturated colors for small elements. Use low-saturation pastels for background areas.
Colormaps
Defines the mapping from data values to colors.
- Types: Can be categorical (distinct types), sequential (ordered data low to high), or diverging (meaningful midpoint, like temperature).
- Expressiveness: The choice of colormap must match the type of data. Hue is best for categorical data (no inherent order). Luminance & Saturation are best for ordered data (no inherent order).
- Distinguishability: Humans can only distinguish 6–12 different colors at once.
- Rainbow Colormaps: Problematic because they have no natural ordering, and sharp transitions create false boundaries.
Data Abstraction
Data and Dataset Types
- Items: Discrete entities.
- Attributes: Properties or variables.
- Links: Relationships.
- Positions: Spatial data.
- Grids: How continuous data is sampled.
Attribute Types
- Categorical (Nominal): Unordered labels; only equality testing is meaningful.
- Ordinal: Ordered, but distance is not meaningful (e.g., S/M/L). Can test for order (<, >).
- Quantitative: Numerical data where arithmetic operations are meaningful. Ordered quantitative data can be further subdivided into sequential (0 to max), diverging (with a zero point), or cyclic (repeating values, like months).
Derived Attributes
Create new attributes by transforming existing ones (e.g., transforming Fahrenheit [quantitative] to hot/cold [ordinal]).
Data Model vs. Conceptual Model
- Data Model: The raw mathematical type (e.g., a list of floats).
- Conceptual Model: The real-world meaning (e.g., hot/cold).
Visual Encoding
Marks and Channels
- Marks: Basic graphical elements representing data items (points, lines, areas).
- Channels: [Content cut short as per original document structure]
