SOC 222: Measuring the Social World Study Notes

Posted on Apr 5, 2026 in Statistics

SOC 222: Measuring the Social World

Key Concepts and Definitions

Population vs. Sample

Population: The entire group you want to study. Example: All students at UTM.
Sample: A subset of the population used to make conclusions. Example: 100 UTM students surveyed in the library.
Population Parameter: The true value in the population. Example: The actual percentage of all UTM students who cheat.
Sample Statistic: The estimate derived from the sample. Example: 15% of surveyed students admit to cheating.
Sampling Error: The difference between the sample statistic and the population parameter due to chance. Example: Sample shows 15% cheating but the true population rate is 12%.

Sampling Types and Biases

Probability Sampling: Each member has a known, non-zero chance of selection. Example: Simple Random Sample (SRS) where every UTM student has an equal chance to be picked.
Nonprobability Sampling: Selection without known probabilities. Example: Surveying only students who volunteer online.
Sampling Bias: Systematic error from a non-representative sample. Example: Surveying only students in the library excludes those who study elsewhere.
Response Bias: When question wording or topic sensitivity affects answers. Example: Asking “Do you cheat on exams?” may cause underreporting due to social desirability.
Nonresponse Bias: When certain groups do not respond, skewing results. Example: Students who cheat might avoid answering the survey.

Representative Samples

A sample that reflects important characteristics of the population. Example: A sample including students from all faculties and years at UTM.
Larger sample sizes reduce sampling error. Example: Surveying 500 students gives more accurate estimates than 50 students.
Probability sampling helps achieve representativeness. Example: Using a random number generator to select participants.

Levels of Measurement

Nominal: Categories without a natural order. Example: Student majors (Sociology, Psychology, Biology).
Ordinal: Ordered categories. Example: Class standing (Freshman, Sophomore, Junior, Senior).
Interval/Ratio: Continuous scales with meaningful distances. Example: Age in years or GPA scores.

Statistical Inference

Use samples to estimate population means and proportions. Example: Estimating the average GPA of all UTM students from a sample.
Compare means or proportions between groups. Example: Comparing the average GPA of male vs. female students.
Test independence between variables. Example: Testing if gender is independent of cheating behavior.
Estimate linear relationships (regression). Example: Predicting GPA based on hours studied per week.

Frequency Distributions

A frequency distribution shows how often a variable takes on different values.

Job Satisfaction	Frequency (f)
Very satisfied	4
Moderately satisfied	3
A little dissatisfied	2

Proportions and Percentages

Proportion: The frequency of a category divided by the total number of cases. Formula: Proportion = f / N. Example: 4/9 ≈ 0.44.
Percentage: The proportion multiplied by 100. Formula: Percentage = Proportion × 100. Example: 0.44 × 100 = 44%.

Visualizing Data Distributions

Bar Charts and Pie Charts

Bar Chart: Displays frequencies or percentages for categories. Each bar’s height corresponds to the value.
Pie Chart: Shows the proportion of categories as slices of a circle. Note: Bar charts are often preferred in academic work for clarity.

Histograms and Line Graphs

Histogram: Used for continuous (interval/ratio) data. It groups data into bins (ranges).
Line Graph: Shows trends over levels of another variable, often time. Example: Plotting average happiness over several years.

Measures of Center

Mode: The most common value or category. Best for nominal variables.
Median: The middle value when data are ordered. Best for skewed data or data with outliers.
Mean (Average): Sum of all values divided by the number of values. Formula: Mean = Σx / n. Sensitive to outliers.

Measures of Spread

Range: The difference between the largest and smallest values. Formula: Range = Max – Min.
Interquartile Range (IQR): The difference between the 3rd quartile (Q3) and 1st quartile (Q1). Measures the middle 50% of data.
Variance (s²): The average of squared deviations from the mean.
Standard Deviation (s): The square root of the variance. It represents the average deviation from the mean in original units.

Choosing Measures Based on Data Type

Data Type	Measure of Center	Measure of Spread
Nominal	Mode	N/A
Ordinal	Median or Mean	IQR or Standard Deviation
Continuous	Mean	Standard Deviation or IQR

Practical Tips and R Software

Honesty: Encourage honest responses to reduce bias by assuring anonymity.
Study Habits: Avoid distractions and ensure 7+ hours of sleep to improve memory.
R Software Steps:
1. Load necessary packages.
2. Set the working directory to your data folder.
3. Read data into R using correct file names.

Academic Integrity

Ethical conduct is essential. Avoid the following:

Copying definitions from Wikipedia.
Working together on individual assignments.
Copying answers from others or using AI to generate answers.

SOC 222: Measuring the Social World Study Notes

SOC 222: Measuring the Social World

Key Concepts and Definitions

Population vs. Sample

Sampling Types and Biases

Representative Samples

Levels of Measurement

Statistical Inference

Frequency Distributions

Proportions and Percentages

Visualizing Data Distributions

Bar Charts and Pie Charts

Histograms and Line Graphs

Measures of Center

Measures of Spread

Choosing Measures Based on Data Type

Practical Tips and R Software

Academic Integrity

Recent Notes

Subjects

Publicidad