Data Types and Statistical Analysis Concepts Explained
Q1. Data Types: Categorical vs. Numerical
[8–9 Marks]
Answer:
Data comprises raw facts and figures collected for analysis and decision-making. Based on nature, data is mainly classified into Categorical data and Numerical data.
1) Categorical Data (Qualitative)
Categorical data represents qualities or categories and cannot be measured numerically.
Types:
Nominal: No natural order
Example: Gender (Male/Female), Blood GroupOrdinal: Ordered categories
Example: Grades (A, B, C), Satisfaction level
Example:
Car color → Red, Blue, Black
2) Numerical Data (Quantitative)
Numerical data represents numbers and can be measured.
Types:
Discrete: Countable values
Example: Number of students, number of carsContinuous: Measurable values
Example: Height, weight, temperature
Example:
Marks obtained → 78, 85, 92
Conclusion:
Categorical data describes type, while numerical data describes quantity and is used for mathematical analysis.
Q2. Data Visualization: Line, Scatter, and Box Plots
[9 Marks]
Answer:
Data Visualization is the graphical representation of data to identify patterns, trends, and relationships easily.
1) Line Plot
- Shows change of data over time
- Data points connected by lines
Example:
Daily temperature changes over a week
Use:
Time-series analysis
2) Scatter Plot
- Shows relationship between two numerical variables
- Each point represents one observation
Example:
Height vs Weight of students
Use:
Finding correlation between variables
3) Box Plot
- Displays distribution using quartiles
- Shows median, minimum, maximum, and outliers
Example:
Exam marks distribution of students
Conclusion:
Visualization helps in quick understanding, comparison, and decision-making.
Q3. Mean, Median, and Mode Explained
[8 Marks]
Answer:
Mean
Average of all values
Mean = ∑xn
Median
Middle value after arranging data in order.
Mode
Value that occurs most frequently.
Example:
Data: 10, 20, 20, 30, 40
- Mean = (10+20+20+30+40)/5 = 24
- Median = 20
- Mode = 20
Conclusion:
Mean is affected by extreme values, while Median and Mode represent central tendency better in skewed data.
Q4. Standard Deviation: Measuring Data Spread
[8 Marks]
Answer:
Standard Deviation (SD) measures dispersion or spread of data around the mean.
Formula:
σ = √∑(x-−x¯)2n
Key Points:
- Low SD → data close to mean
- High SD → data widely spread
- Used in risk analysis and quality control
Example:
Exam scores with high SD indicate varied student performance.
Conclusion:
Standard deviation helps in understanding data variability and consistency.
Q5. Univariate vs. Bivariate Analysis Differentiation
[8 Marks]
| Basis | Univariate | Bivariate |
|---|---|---|
| Variables | One | Two |
| Purpose | Describe data | Relationship analysis |
| Examples | Mean, Median | Correlation |
| Graphs | Histogram | Scatter plot |
Conclusion:
Univariate focuses on single variable, while bivariate studies relationship between two variables.
Q6. Histogram and Statistical Thinking
[9 Marks]
Answer:
Histogram
A histogram shows frequency distribution using bars.
Example:
Marks distribution of students
Statistical Thinking
Involves:
- Understanding variation
- Using data for decision-making
- Interpreting patterns and trends
Example:
Using sales data to improve business strategy.
Conclusion:
Histogram supports statistical thinking by visually showing data behavior.
Q7. Normal Distribution Characteristics
[8 Marks]
Answer:
Normal distribution is a symmetric, bell-shaped curve where:
- Mean = Median = Mode
- Most values lie near the mean
Properties:
- 68% data within 1 SD
- 95% within 2 SD
- 99.7% within 3 SD
Example:
Heights of students in a class.
Conclusion:
Normal distribution is widely used in statistics and machine learning.
Q8. Poisson Distribution for Event Probability
[8 Marks]
Answer:
Poisson distribution models the number of times an event occurs in a fixed interval.
Conditions:
- Events are independent
- Constant average rate
Example:
Number of calls received per hour in a call center.
Conclusion:
Used for rare event probability analysis.
