Fundamentals of Statistical Graphics and Data Analysis
Understanding Statistical Graphics
A statistical graphic is the representation of statistical data to obtain an overall visual impression of the material presented, which facilitates its rapid comprehension. Graphics are an alternative to tables for representing frequency distributions. Some recommended requirements for building a graph include: simplicity, avoiding exaggerated scale distortions, and the appropriate choice of chart type according to the objectives and the measurement level of the variables.
Types of Statistical Graphs
- Bar Graph: Used to represent the frequency distribution of a discrete variable. Each category is represented by a bar whose length indicates the frequency of observations in that category.
- Divided Bar Graph: Used to study the frequency distribution of a discrete variable (with a few categories) within different levels of another discrete variable.
- Pie Chart (Sector Graph): An alternative to divided bar charts, this type of graph shows the partition of a total into its component parts. To construct it, a circle is used, where 360° corresponds to 100% of the cases. The process involves expressing each partial amount as a percentage or relative frequency, converting that value into degrees, and drawing the corresponding angles.
- Pictograms: A way of representing information using pictures of the objects being studied, formatted for a quick, visual understanding of the frequency distribution. They are useful for advertising as they are attractive and easy to interpret.
- Histogram: This graph is especially suitable for representing frequencies in the case of constant interval or ratio variables. It consists of a series of adjacent bars whose areas are proportional to the frequency of the range over which they rise. If the intervals have equal amplitude, the height of the rectangles is proportional to the corresponding frequency.
- Scatter Diagram: Used when studying the possible association between two variables at the interval or ratio level. It is useful to represent the observations on a Cartesian coordinate system. This results in a cloud of points on the plane, called a scatter diagram or correlation graph.
Key Statistical Concepts
Statistics involves collecting data, organizing it into tables and graphs, and analyzing it to achieve a specific goal.
- Frequency: The number of times a certain value of the variable is repeated.
- Relative Frequency: The proportion of the total represented by each value of the variable. Multiplying the relative frequency by 100 gives the percentage.
- Cumulative Frequency: The sum of the frequency of a particular value and the frequencies of all previous values. It shows how much data has accumulated up to a certain point.
- Sample Size (n): The number of data points in the problem.
- Population Size (N): The total size of the population.
- Population: A collection of all items that share a common feature.
- Representative Sample: A randomly selected subset of cases from a population, chosen to be representative.
- Variable Types: Variables can be Qualitative, Quantitative Discrete (integers), or Quantitative Continuous (real values).
Measures of Central Tendency
- Arithmetic Mean: The average of all values of the variable.
- Mode: The value that is repeated most often (the highest absolute frequency). A dataset can be unimodal, bimodal, or have no mode.
- Median: The value that divides the sorted data, leaving 50% of the data below it and 50% above it.
- Range: The difference between the highest and lowest values in a dataset.
Interpreting Statistical Data: Examples
- There are 4 people who have no siblings.
- 18% of students have 3 siblings.
- There are 31 students who have at most 2 siblings.
- On average, between 3 and 4 people live in each home.
- The most frequent number of people living in a home is 3.
- In 50% of households, fewer than 3 people live together, and in the other 50%, more than 3 people live together.
- The average distance of the values from the arithmetic mean is 1.75.
- The average of the squared distances from the arithmetic mean is 3.06.