Visual Analytics: Design Principles and Data Visualization Techniques

Visual Analytics

Goal

Create tools and techniques to enable people to:

  • Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data.
  • Detect the expected and discover the unexpected
  • Provide timely, defensible, and understandable assessments.
  • Communicate these assessments effectively for action

Definition of Visualization

Provide visual representations of datasets designed to help people carry out tasks more effectively.

Design Abstraction Levels

Problem Characterization

Describe specific issues of the application domain and end users involved, such as the problem to solve, user demands, and datasets. Designers obtain outcomes at this level through end-user interviews, direct observation of their work in real work conditions, and research about end-user work.

Data and Task Abstractions

Make abstractions of the specific tasks and data involved in the application domain and map them to generic representations independent from the concrete application domain. For tasks, identify the abstract tasks required by end users in their workflow.

Interaction and Visual Encoding

Determine the specific design choices for creating and manipulating the visual representations of the abstract data types selected in the upper abstraction level, guided by the abstract tasks identified at that level. Each distinct possible approach is called an idiom.

Algorithmic Implementation

Achieve an efficient implementation of the visual encoding and the interaction techniques selected in the previous abstraction level.

Data Types

Attributes

A property that can be measured, observed, or recorded.

  • Categorical: Values without any implicit ordering (e.g., single-married).
  • Ordered-Ordinal: Data with a well-defined ordering, but mathematical operations cannot be performed (e.g., first or second in a race).
  • Quantitative: Measurements of magnitudes that support mathematical operations (e.g., weight or height). In most cases, ordered data are quantitative.

Items

An individual entity that is discrete (a row in a simple table or a node in a network).

Links

A relationship between items, usually within a network.

Grids

Defines a strategy for sampling continuous data in terms of geometric and topological relationships between its cells.

Positions

Spatial data, typically in 2D or 3D (e.g., latitude and longitude coordinates).

Marks

Basic geometric elements that depict items or links. Marks can be classified according to the number of dimensions required for their representation (points, lines, areas, volumes).

Channels

Control the appearance of the marks (spatial position [alignment, 3D, regions], color [hue, saturation, luminance], size [length, area, volume], orientation, curvature, shape, motion, patterns).

Expressiveness Principle

Specifies that visual encoding should express all, and only, the information in the dataset attributes.

Accuracy

Determines how close human perception is to some objective measure of stimulation (e.g., a graph with continuous lines vs. a normal graph).

Discriminability

Determines if there are noticeable differences between different items encoded with a particular visual channel.

Separability

Establishes a continuous gradation between pairs of channels ranging from those channels that are orthogonal and independently separable to the channels whose combination is inherently integral (not separable).

Popout

Many visual channels provide a visual popout, by which an item stands out from the rest immediately. The great value of the popout is that the time required to identify the different object does not depend on the number of distracting objects (e.g., a red dot among many black dots).

Grouping

Spatial proximity. Link marks.

Most Effective Channels to Represent

  • Quantitative and ordered attributes: 2D spatial positions along an aligned scale, 2D spatial positions along a common scale, length.
  • Categorical: Grouping of items within the same region.

Hue

Representation of a pure color. It is used for categorical values and groupings. It is the most effective channel after the spatial position.

Saturation

Mixture of a pure color with white. It is a quantitative channel (how-much channel) suitable for ordinal data.

Lightness

Mixture of pure color with black.

Luminance

Visually perceived brightness (description of the visual channel in black and white). It is a suitable channel to represent ordered quantitative values.

Other Channels

  • Size: Suits well for ordered data.
  • Angle: Encodes quantitative information based on the orientation of a mark, in the direction to which it points. Is more effective than the area.
  • Curvature: Not a very effective quantitative channel (similar to volume) and can only be used with lines.
  • Shape: A what/when channel that can be used with point and line marks. The most usual way is to use it with points.
  • Motion: A very separable channel from the most effective channels, such as the static spatial position or color.
  • Textures: Can be seen as a combination of three parameters: orientation, scale, and contrast. They can be used to encode categorical attributes with up to ten values.

Channels Ranking

  • Categorical (from most to least effective): Spatial region > hue > motion > shape.
  • Other data types (from most to least effective): Spatial position (2D) on common scale, Spatial position (2D) on unaligned scale, length, orientation, area, depth, luminance, curvature, saturation, volume

Charts

Scatterplots

  • Scatterplots/Bubble Plots (scatterplot + size + color): Encode two quantitative variables using horizontal and vertical spatial position channels. Effective for overviewing the dataset, characterizing distributions, finding outliers and extreme values, and correlating two attributes.

Bar Charts

  • Bar Charts: Encode a quantitative attribute (y-axis) with spatial position and a qualitative attribute with spatial region (x-axis). Requires leaving enough space between regions that separate the marks. Used for lookup and comparison of values.
  • Stacked Bar Charts: For each component, multiple subcomponents are stacked in one dimension. The total length of the entire glyph encodes a value, just like the lengths of each subcomponent do. They use both the length and color as visual channels. Used for part-to-whole relationships, lookup values, and finding trends.
  • Streamgraph: A more complex generalization of stacked bar charts, where the main axis (horizontal or vertical) represents an ordered attribute (time) and the complementary spatial dimension represents a categorical key attribute together with a quantitative attribute. They can represent the continuity of a dataset over time.
  • Dot Charts: Allow to visually encode a quantitative attribute along an ordered attribute using the spatial position of the marks, which in this case are points.
  • Line Charts: Dot plots with lines as connecting marks between points. Used to show trends.

Matrix

  • Heatmaps: Each attribute value in a cell is represented by a two-dimensional mark that has been assigned a color. Used to find clusters, outliers, and summarize data.
  • Scatter Plot Matrix: Each cell contains a scatterplot chart. Used to find correlations, trends, and outliers.

Parallel Coordinates

An item is represented by a polyline that runs crossing all parallel axes at one point that represents the value of that item in the corresponding attribute represented by the axis. Used to search for correlations between attributes.

Pie Charts

The radial representation most used in the field of statistics. Estimations based on this combination are less accurate than those based on the length of the line marks. Used for part-whole relationships.

Choropleth Map

Shows color-encoded quantitative attributes in regions limited with two-dimensional marks.

Colormaps

  • Diverging: Starts with a strong color, gradually fades, then changes to a different color, increases in intensity, and finally returns to a strong color different from the first.
  • Sequential: Starts with a strong color and gradually fades.
  • Qualitative/Categorical: Many strong colors.

Visual Channels

  • Spatial position: Where something is located in space.
  • Spatial region: The region of space that something occupies.
  • Length: The length of something.
  • Orientation: The angle of something.
  • Area: The size of something.
  • Luminance: A darker or lighter color.
  • Saturation: From a dark color to white.