Statistical Analysis: Regression and Probability Models
Regression Analysis and Predictive Modeling
Regression analysis is a statistical method used to model the relationship between variables and to predict the value of one variable using another.
Main Types of Regression
- Simple linear regression: One independent variable and one dependent variable.
- Multiple regression: Several independent variables predicting one dependent variable.
- Logistic regression: Used when the dependent variable represents probabilities or categories.
The goal of simple linear regression
Read MoreEssential Statistics: Sampling, Distributions, and Testing
1. Sampling and Basic Concepts
Population: The entire group being studied.
Sample: A subset of the population.
Example
- Population: All university students.
- Sample: 200 students surveyed.
Parameter vs. Statistic
- Parameter: A numerical value describing a population.
- Statistic: A numerical value derived from a sample.
Examples:
- p = True population proportion.
- p̂ (p-hat) = Sample proportion.
Sample Proportion Formula
p̂ = x / n
Where:
- x = Number of successes.
- n = Sample size.
Example: 48 support a policy out of 80.
Read MoreCIS 2500 Exam 1 Excel Cheat Sheet: Data and Statistics
CIS 2500 – Exam 1 Cheat Sheet (Excel Focused)
Chapter 1 – Data Basics
- Population: All items in the study.
- Sample: Subset of the population.
- Parameter: Numerical value describing a population.
- Statistic: Numerical value describing a sample.
- Cross-sectional: Many entities at one time.
- Time series: One entity across the same point in time.
- Nominal: Labels only (numeric or non-numeric).
- Ordinal: Ranked categories (numeric or non-numeric).
- Interval: Numeric, no true zero.
- Ratio: Numeric, true zero.
- Qualitative
Essential Programming and Data Science Q&A
Python Fundamentals
Q: What is the difference between if, elif, and else?
ifchecks an initial condition.elifchecks another condition if the previous one is false.elseruns if none of the preceding conditions are true.
Q: When do we use a for loop instead of while?
- We use
forwhen we know the number of iterations. whileis used when the condition controls the loop execution.
Q: What is an infinite loop?
A loop that never stops because its condition is always true.
Q: What is a function?
A reusable block
Read MoreStatistical Inference: Sampling and Confidence Intervals
Chapter 9: Sampling Distributions
Quantile-Quantile Plot (QQ-Plot)
Empirical Rule: This property states that approximately 68%, 95%, and 99.7% of data falls within 1, 2, and 3 standard deviations of the mean, respectively.
Standard Normal Distribution
The Standard Normal distribution has a mean of 0 and a standard deviation of 1.
Example: If you want to know the percentage of babies that weigh less than 95 ounces at birth, you must first convert the value 95 to a standardized score (STAT).
Based
Read MoreStatistical Analysis and Predictive Modeling in Excel
Descriptive Statistics and Central Tendency
Descriptive statistics are the numbers that summarize a dataset, giving you a quick “snapshot” of its typical values and how much they vary. These are divided into Measures of Central Tendency (the middle) and Measures of Dispersion (the spread).
1. Measures of Central Tendency
These identify the “center” of your data where most values congregate.
- Mean (Average): The sum of all values divided by the total count. It is the most common measure but is highly
