SOC 222: Measuring the Social World Study Notes

SOC 222: Measuring the Social World

Key Concepts and Definitions

Population vs. Sample

  • Population: The entire group you want to study. Example: All students at UTM.
  • Sample: A subset of the population used to make conclusions. Example: 100 UTM students surveyed in the library.
  • Population Parameter: The true value in the population. Example: The actual percentage of all UTM students who cheat.
  • Sample Statistic: The estimate derived from the sample. Example: 15% of surveyed students admit to cheating.
  • Sampling Error: The difference between the sample statistic and the population parameter due to chance. Example: Sample shows 15% cheating but the true population rate is 12%.

Sampling Types and Biases

  • Probability Sampling: Each member has a known, non-zero chance of selection. Example: Simple Random Sample (SRS) where every UTM student has an equal chance to be picked.
  • Nonprobability Sampling: Selection without known probabilities. Example: Surveying only students who volunteer online.
  • Sampling Bias: Systematic error from a non-representative sample. Example: Surveying only students in the library excludes those who study elsewhere.
  • Response Bias: When question wording or topic sensitivity affects answers. Example: Asking “Do you cheat on exams?” may cause underreporting due to social desirability.
  • Nonresponse Bias: When certain groups do not respond, skewing results. Example: Students who cheat might avoid answering the survey.

Representative Samples

  • A sample that reflects important characteristics of the population. Example: A sample including students from all faculties and years at UTM.
  • Larger sample sizes reduce sampling error. Example: Surveying 500 students gives more accurate estimates than 50 students.
  • Probability sampling helps achieve representativeness. Example: Using a random number generator to select participants.

Levels of Measurement

  • Nominal: Categories without a natural order. Example: Student majors (Sociology, Psychology, Biology).
  • Ordinal: Ordered categories. Example: Class standing (Freshman, Sophomore, Junior, Senior).
  • Interval/Ratio: Continuous scales with meaningful distances. Example: Age in years or GPA scores.

Statistical Inference

  • Use samples to estimate population means and proportions. Example: Estimating the average GPA of all UTM students from a sample.
  • Compare means or proportions between groups. Example: Comparing the average GPA of male vs. female students.
  • Test independence between variables. Example: Testing if gender is independent of cheating behavior.
  • Estimate linear relationships (regression). Example: Predicting GPA based on hours studied per week.

Frequency Distributions

A frequency distribution shows how often a variable takes on different values.

Job SatisfactionFrequency (f)
Very satisfied4
Moderately satisfied3
A little dissatisfied2

Proportions and Percentages

  • Proportion: The frequency of a category divided by the total number of cases. Formula: Proportion = f / N. Example: 4/9 ≈ 0.44.
  • Percentage: The proportion multiplied by 100. Formula: Percentage = Proportion × 100. Example: 0.44 × 100 = 44%.

Visualizing Data Distributions

Bar Charts and Pie Charts

  • Bar Chart: Displays frequencies or percentages for categories. Each bar’s height corresponds to the value.
  • Pie Chart: Shows the proportion of categories as slices of a circle. Note: Bar charts are often preferred in academic work for clarity.

Histograms and Line Graphs

  • Histogram: Used for continuous (interval/ratio) data. It groups data into bins (ranges).
  • Line Graph: Shows trends over levels of another variable, often time. Example: Plotting average happiness over several years.

Measures of Center

  • Mode: The most common value or category. Best for nominal variables.
  • Median: The middle value when data are ordered. Best for skewed data or data with outliers.
  • Mean (Average): Sum of all values divided by the number of values. Formula: Mean = Σx / n. Sensitive to outliers.

Measures of Spread

  • Range: The difference between the largest and smallest values. Formula: Range = Max – Min.
  • Interquartile Range (IQR): The difference between the 3rd quartile (Q3) and 1st quartile (Q1). Measures the middle 50% of data.
  • Variance (s²): The average of squared deviations from the mean.
  • Standard Deviation (s): The square root of the variance. It represents the average deviation from the mean in original units.

Choosing Measures Based on Data Type

Data TypeMeasure of CenterMeasure of Spread
NominalModeN/A
OrdinalMedian or MeanIQR or Standard Deviation
ContinuousMeanStandard Deviation or IQR

Practical Tips and R Software

  • Honesty: Encourage honest responses to reduce bias by assuring anonymity.
  • Study Habits: Avoid distractions and ensure 7+ hours of sleep to improve memory.
  • R Software Steps:
    1. Load necessary packages.
    2. Set the working directory to your data folder.
    3. Read data into R using correct file names.

Academic Integrity

Ethical conduct is essential. Avoid the following:

  • Copying definitions from Wikipedia.
  • Working together on individual assignments.
  • Copying answers from others or using AI to generate answers.