Statistical Analysis: Regression and Probability Models

Regression Analysis and Predictive Modeling

Regression analysis is a statistical method used to model the relationship between variables and to predict the value of one variable using another.

Main Types of Regression

  • Simple linear regression: One independent variable and one dependent variable.
  • Multiple regression: Several independent variables predicting one dependent variable.
  • Logistic regression: Used when the dependent variable represents probabilities or categories.

The goal of simple linear regression

Read More

Essential Statistics: Sampling, Distributions, and Testing

1. Sampling and Basic Concepts

Population: The entire group being studied.
Sample: A subset of the population.

Example

  • Population: All university students.
  • Sample: 200 students surveyed.

Parameter vs. Statistic

  • Parameter: A numerical value describing a population.
  • Statistic: A numerical value derived from a sample.

Examples:

  • p = True population proportion.
  • (p-hat) = Sample proportion.

Sample Proportion Formula

p̂ = x / n

Where:

  • x = Number of successes.
  • n = Sample size.

Example: 48 support a policy out of 80.

Read More

CIS 2500 Exam 1 Excel Cheat Sheet: Data and Statistics

CIS 2500 – Exam 1 Cheat Sheet (Excel Focused)

Chapter 1 – Data Basics

  • Population: All items in the study.
  • Sample: Subset of the population.
  • Parameter: Numerical value describing a population.
  • Statistic: Numerical value describing a sample.
  • Cross-sectional: Many entities at one time.
  • Time series: One entity across the same point in time.
  • Nominal: Labels only (numeric or non-numeric).
  • Ordinal: Ranked categories (numeric or non-numeric).
  • Interval: Numeric, no true zero.
  • Ratio: Numeric, true zero.
  • Qualitative
Read More

Essential Programming and Data Science Q&A

Python Fundamentals

Q: What is the difference between if, elif, and else?

  • if checks an initial condition.
  • elif checks another condition if the previous one is false.
  • else runs if none of the preceding conditions are true.

Q: When do we use a for loop instead of while?

  • We use for when we know the number of iterations.
  • while is used when the condition controls the loop execution.

Q: What is an infinite loop?

A loop that never stops because its condition is always true.

Q: What is a function?

A reusable block

Read More

Statistical Inference: Sampling and Confidence Intervals

Chapter 9: Sampling Distributions

AD_4nXfRcElAPBL2dkpgITWia_2YORkU4jen0Kc0uRf-7Np2TdK5imBRcmlAaemCemawGJMMjKWZcI3ay5Kt2ZfeBFFUoUehSOe7fNE03Pk8DWJUPbsSjRB38SfP11Hrho5V4kKvC_TpOA?key=A0C-ePRN4NNNgOkXxBBprg


AD_4nXfyvWfOb42V2zJIWq7j7hTXZCx6ZmGSxdDobtCguKlawzPIZdNhP07F_NqL-4Gr7S73UkLp0d7zIUu7eIZfkFnL9Q4-hzkq2IlbkTAqpV8Y5D4-2p4OP8m88YBiXeJkQYiJ1GA3lw?key=A0C-ePRN4NNNgOkXxBBprg

Quantile-Quantile Plot (QQ-Plot)

Empirical Rule: This property states that approximately 68%, 95%, and 99.7% of data falls within 1, 2, and 3 standard deviations of the mean, respectively.

AD_4nXdKgOndzBU1_9M0VJgWpgBmDHmP_b2naUhIjeJDxnVrYAK_71o47za4uuyX3-s2hvA7c6qIDItph9npoteAPXzKCSLacLx41r38s7ZUooAtlb3s1z6ny5Y2yANNdtyzhc6_dtc4?key=A0C-ePRN4NNNgOkXxBBprg

Standard Normal Distribution

The Standard Normal distribution has a mean of 0 and a standard deviation of 1.

AD_4nXdPvINv2xcZRNSKguRquWoEF82pr7Mc69EkrtV4_w5r4z8vzjcBgPKs6tdQ8LJ4l9FYqGPiw4TJ-21rZAfmqdJqFKjlTBfw8RiGwiYfqdSPexJgyMwGvVgChqMNvpsxEMcDbgUMTQ?key=A0C-ePRN4NNNgOkXxBBprg

Example: If you want to know the percentage of babies that weigh less than 95 ounces at birth, you must first convert the value 95 to a standardized score (STAT).

AD_4nXc5DAv3XjjYLqsGnQt0-fammuWcVTQBwq-OEoSvevdmnMeYfg9_8R44pWwAmKiYypD27IZ32V1smOoVZHgY2GfDT-tlBTRQQs1k--E-F36mmBLRMwp3tBwrMJfLvPzBs6SYX2l8?key=A0C-ePRN4NNNgOkXxBBprg

AD_4nXdvYeSYfBEH_3JZcIeWMGdFuLHnJZ65lBO8NVezXDMNzDrmP97qCD0dvDv1uGdDuYh7jU1tHjyOokpRyo9XfY6DZNM0twuYKDNIrq1pRMHeUx96gYOtrFBsE2br40EdjJJ9zMljAw?key=A0C-ePRN4NNNgOkXxBBprg

Based

Read More

Statistical Analysis and Predictive Modeling in Excel

Descriptive Statistics and Central Tendency

Descriptive statistics are the numbers that summarize a dataset, giving you a quick “snapshot” of its typical values and how much they vary. These are divided into Measures of Central Tendency (the middle) and Measures of Dispersion (the spread).

1. Measures of Central Tendency

These identify the “center” of your data where most values congregate.

  • Mean (Average): The sum of all values divided by the total count. It is the most common measure but is highly
Read More