Understanding Key Statistical Concepts and Theorems

Law of Large Numbers

If you take samples of larger and larger size from any population, then the mean (x̄) of the sample tends to get closer and closer to μ (the population mean).

Sampling Distribution

The sampling distribution of the mean approaches a normal distribution as n (the sample size) increases.

Central Limit Theorem

The larger the sample size, the more normal the distribution will be.

Standard Error

The standard error is the standard deviation of the distribution of the sample means. T-distributions

Read More

Understanding Probability, Distributions, and Statistical Analysis

Understanding Probability

The probability of a given event may be defined as the numerical value given to the likelihood of the occurrence of that event. It is a number lying between ‘0’ and ‘1’. ‘0’ denotes the event which cannot occur, and ‘1’ denotes the event which is certain to occur. For example, when we toss a coin, we can enumerate all the possible outcomes (head and tail), but we cannot say which one will happen.

Permutations

Permutation means arrangement of objects in a definite

Read More

Epidemiology: Key Concepts and Measures

Proportion: Numerator is always a subset of the denominator; dimensionless (0 to 1 or 0% to 100%).

Rates: Describe changes in one quantity per unit of time. Unit = 1/time. Range (0-infinity).

Risk

Equation

Proportion | Dimensionless (can be expressed as a percent) | Appropriate for fixed populations with minimal losses to follow-up because we assume everyone was followed for a specific period.

  • New Cases: Numerator. New, non-existing cases. For diseases that can occur more than once, it is the first occurrence
Read More

Statistics Essentials: Stem Plots, Quartiles, Correlation

Stem Plots

To make a stem plot:

  1. Separate each observation into a stem (all but the final digit) and a leaf (the final digit). Stems may have as many digits as needed, but each leaf contains only a single digit.
  2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line to the right of this column. Include all stems needed to span the data, even with no leaves.
  3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.

Quartiles and Interquartile

Read More

Data Attributes and Similarity Measures: Exercises

Data Attributes Classification and Analysis

Exercise 1: Attribute Types

Classify the following attributes as binary, discrete, or continuous. Also, classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Briefly indicate your reasoning if there may be some ambiguity.

  1. Time in terms of AM or PM.
    Binary, qualitative, ordinal.
  2. Brightness as measured by a light meter.
    Continuous, quantitative, ratio.
  3. Brightness as measured by people’s judgments.
    Discrete, qualitative, ordinal.
Read More

Statistical Measures: Location, Variance, and Probability

Measures of Location

Minimum: Smallest number in the data set.

Maximum: Largest number in the data set.

Median: The middle number, or the average of the two middle numbers if the data set has an even number of values.

Mean: The average of all numbers in the data set.

Mode: The most frequent number. A data set can have multiple modes.

Quartiles

First Quartile (Q1): The median of the lower half of the data.

Third Quartile (Q3): The median of the upper half of the data.

Measures of Variance

Range: Maximum –

Read More