SOC 222: Measuring the Social World Study Notes
SOC 222: Measuring the Social World
Key Concepts and Definitions
Population vs. Sample
- Population: The entire group you want to study. Example: All students at UTM.
- Sample: A subset of the population used to make conclusions. Example: 100 UTM students surveyed in the library.
- Population Parameter: The true value in the population. Example: The actual percentage of all UTM students who cheat.
- Sample Statistic: The estimate derived from the sample. Example: 15% of surveyed students admit to cheating.
- Sampling Error: The difference between the sample statistic and the population parameter due to chance. Example: Sample shows 15% cheating but the true population rate is 12%.
Sampling Types and Biases
- Probability Sampling: Each member has a known, non-zero chance of selection. Example: Simple Random Sample (SRS) where every UTM student has an equal chance to be picked.
- Nonprobability Sampling: Selection without known probabilities. Example: Surveying only students who volunteer online.
- Sampling Bias: Systematic error from a non-representative sample. Example: Surveying only students in the library excludes those who study elsewhere.
- Response Bias: When question wording or topic sensitivity affects answers. Example: Asking “Do you cheat on exams?” may cause underreporting due to social desirability.
- Nonresponse Bias: When certain groups do not respond, skewing results. Example: Students who cheat might avoid answering the survey.
Representative Samples
- A sample that reflects important characteristics of the population. Example: A sample including students from all faculties and years at UTM.
- Larger sample sizes reduce sampling error. Example: Surveying 500 students gives more accurate estimates than 50 students.
- Probability sampling helps achieve representativeness. Example: Using a random number generator to select participants.
Levels of Measurement
- Nominal: Categories without a natural order. Example: Student majors (Sociology, Psychology, Biology).
- Ordinal: Ordered categories. Example: Class standing (Freshman, Sophomore, Junior, Senior).
- Interval/Ratio: Continuous scales with meaningful distances. Example: Age in years or GPA scores.
Statistical Inference
- Use samples to estimate population means and proportions. Example: Estimating the average GPA of all UTM students from a sample.
- Compare means or proportions between groups. Example: Comparing the average GPA of male vs. female students.
- Test independence between variables. Example: Testing if gender is independent of cheating behavior.
- Estimate linear relationships (regression). Example: Predicting GPA based on hours studied per week.
Frequency Distributions
A frequency distribution shows how often a variable takes on different values.
| Job Satisfaction | Frequency (f) |
|---|---|
| Very satisfied | 4 |
| Moderately satisfied | 3 |
| A little dissatisfied | 2 |
Proportions and Percentages
- Proportion: The frequency of a category divided by the total number of cases. Formula: Proportion = f / N. Example: 4/9 ≈ 0.44.
- Percentage: The proportion multiplied by 100. Formula: Percentage = Proportion × 100. Example: 0.44 × 100 = 44%.
Visualizing Data Distributions
Bar Charts and Pie Charts
- Bar Chart: Displays frequencies or percentages for categories. Each bar’s height corresponds to the value.
- Pie Chart: Shows the proportion of categories as slices of a circle. Note: Bar charts are often preferred in academic work for clarity.
Histograms and Line Graphs
- Histogram: Used for continuous (interval/ratio) data. It groups data into bins (ranges).
- Line Graph: Shows trends over levels of another variable, often time. Example: Plotting average happiness over several years.
Measures of Center
- Mode: The most common value or category. Best for nominal variables.
- Median: The middle value when data are ordered. Best for skewed data or data with outliers.
- Mean (Average): Sum of all values divided by the number of values. Formula: Mean = Σx / n. Sensitive to outliers.
Measures of Spread
- Range: The difference between the largest and smallest values. Formula: Range = Max – Min.
- Interquartile Range (IQR): The difference between the 3rd quartile (Q3) and 1st quartile (Q1). Measures the middle 50% of data.
- Variance (s²): The average of squared deviations from the mean.
- Standard Deviation (s): The square root of the variance. It represents the average deviation from the mean in original units.
Choosing Measures Based on Data Type
| Data Type | Measure of Center | Measure of Spread |
|---|---|---|
| Nominal | Mode | N/A |
| Ordinal | Median or Mean | IQR or Standard Deviation |
| Continuous | Mean | Standard Deviation or IQR |
Practical Tips and R Software
- Honesty: Encourage honest responses to reduce bias by assuring anonymity.
- Study Habits: Avoid distractions and ensure 7+ hours of sleep to improve memory.
- R Software Steps:
- Load necessary packages.
- Set the working directory to your data folder.
- Read data into R using correct file names.
Academic Integrity
Ethical conduct is essential. Avoid the following:
- Copying definitions from Wikipedia.
- Working together on individual assignments.
- Copying answers from others or using AI to generate answers.
