Essential Probability and Statistics Principles
Counting Principles and Techniques
Fundamental Counting Principle
Fundamental Counting Principle: If a task or process is made up of stages with separate (distinct) choices at each stage, the total number of choices is m × n × p × …, where m is the number of choices at the first stage, n is the number of choices at the second stage, and so forth.
Tree Diagrams vs. Fundamental Counting Principle
Tree Diagrams: A tree diagram is a useful tool for organized counting that shows all possible outcomes of a situation using branches.
Tree Diagram vs. FCP: The Fundamental Counting Principle (FCP) identifies how many outcomes exist, while tree diagrams identify what those specific outcomes are.
The Indirect Method
Indirect Method: A problem-solving technique for finding a quantity by determining a secondary quantity and subtracting it from the total. Formula: Total without restrictions − Unwanted cases.
Probability Fundamentals and Event Types
Mutually Exclusive Events and the Rule of Sum
Mutually Exclusive: Events that cannot occur at the same time. When these mutually exclusive events are both desirable, we can add their outcomes together to find the total number of possibilities of one or the other occurring. This is known as the Rule of Sum.
Independent and Dependent Events
- Independent Event: The outcome of one event has no effect on the outcome of another event.
- Dependent Event: An event whose outcome depends directly on the outcome of another event.
Principle of Inclusion and Exclusion
Principle of Inclusion/Exclusion: Use this formula to find the number of people or elements included in a Venn diagram.
Types of Probability and Odds
Defining Probability
Probability: The likelihood of something occurring, represented as a fraction, decimal, or percent. The probability of an event is the measure of the chance that the event will occur as a result of an experiment.
Empirical, Subjective, and Theoretical Probability
- Empirical Probability: Probability based on experimental trials, specifically direct observations or experiences. Formula: (Number of times event E occurs) / (Total number of trials) or P(E) = n(E) / n(T).
- Subjective Probability: Probability that reflects personal belief. This involves personal judgment, information, and intuition; it is based on very little, if any, mathematical data.
- Theoretical Probability: Probability based on mathematical analysis (theory and models). Formula: (Number of all favorable outcomes in the event) / (Number of all possible outcomes in the sample space).
Odds vs. Probability
Odds: Measures favorable outcomes to unfavorable outcomes. Odds are another way to express a level of confidence about an outcome. Formula: Odds = P(x) / (1 − P(x)).
Difference between Odds and Probability: Odds represent the likelihood of an event occurring expressed as a ratio. Odds are specifically part-to-part (favorable to unfavorable). Probability is the likelihood of an event occurring in the form of a fraction, decimal, or percent; it compares part-to-whole.
Statistical Analysis and Data Sets
Outliers and Correlation
Outliers: Extreme values that do not represent the general nature of the data. Ways to limit the effect of outliers include:
- Removing outliers completely and recalculating.
- Using a larger sample set.
- Choosing a different measure of central tendency, such as the median, which only looks at the middle value rather than every value.
Correlation Coefficient:
- Weak: 0 to 0.33
- Moderate: 0.33 to 0.67
- Strong: 0.67 to 1.0
Permutations, Combinations, and Sets
- Permutation: A selection from a group where order matters.
- Combination: A selection from a group of items without regard to order.
- Empty (Null) Set: A set that contains no elements, denoted using {} or Ø. A null set represents the “no choice” choice. Example: You cannot have a sum of money from no bills.
Measures of Central Tendency and Spread
Mean, Median, and Mode
- Mean: This value is found by adding the data values and dividing by the number of values (the average). The mean is easily skewed by outliers since it includes every value in its calculation. It is most useful when data is fairly balanced.
- Median: This value is found by ranking data values from least to greatest and selecting the middle value. The median is useful when data has outliers or is unevenly spread because it is not heavily affected by extreme values.
- Mode: The value that occurs most frequently in the data set.
Deviation and Variance
Deviation: The difference between an individual value and the mean for the data. The larger the size of the deviations, the greater the spread in the data.
- Higher Standard Deviation: Data points are scattered further away from the mean.
- Lower Standard Deviation: Data points cluster around the mean.
Variance (s²): Measures the average squared distance from the mean.
Probability Distributions and Fair Games
Discrete and Hypergeometric Distributions
- Discrete Uniform Probability Distribution: In a single trial, all outcomes are equally likely.
- Binomial Probability Distribution: Criteria include independent trials, the same probability for each trial, and exactly two outcomes (success and failure).
- Hypergeometric Probability Distribution: Criteria include dependent trials, changing probability with each trial, and exactly two outcomes (success and failure).
Fair Games and Expected Value
Fair Game: Use weighted mean or expected value. Formula: E(x) = ∑x ⋅ P(x), where x represents money or points. For a game to be considered fair, E(x) = 0.
Normal Distribution and Z-Scores
Properties of Normal Distribution:
- Symmetric, meaning mode = mean = median.
- The more data clusters around the mean, the narrower the bell-shaped curve (indicating a smaller standard deviation).
- The highest point of the curve is the mode.
- The area under the curve represents the probability; therefore, the total area is always 1.
- Follows the 68-95-99.7 Rule.
Z-Score: The number of standard deviations that a datum (data point) is from the mean. A more positive z-score is above the mean, while a more negative z-score is below the mean.
