Data Types and Statistical Analysis Concepts Explained

Q1. Data Types: Categorical vs. Numerical

[8–9 Marks]

Answer:
Data comprises raw facts and figures collected for analysis and decision-making. Based on nature, data is mainly classified into Categorical data and Numerical data.

1) Categorical Data (Qualitative)

Categorical data represents qualities or categories and cannot be measured numerically.

Types:

  • Nominal: No natural order

    Example: Gender (Male/Female), Blood Group
  • Ordinal: Ordered categories

    Example: Grades (A, B, C), Satisfaction level

Example:
Car color → Red, Blue, Black

2) Numerical Data (Quantitative)

Numerical data represents numbers and can be measured.

Types:

  • Discrete: Countable values

    Example: Number of students, number of cars
  • Continuous: Measurable values

    Example: Height, weight, temperature

Example:
Marks obtained → 78, 85, 92

Conclusion:
Categorical data describes type, while numerical data describes quantity and is used for mathematical analysis.


Q2. Data Visualization: Line, Scatter, and Box Plots

[9 Marks]

Answer:
Data Visualization is the graphical representation of data to identify patterns, trends, and relationships easily.

1) Line Plot

  • Shows change of data over time
  • Data points connected by lines

Example:
Daily temperature changes over a week

Use:
Time-series analysis

2) Scatter Plot

  • Shows relationship between two numerical variables
  • Each point represents one observation

Example:
Height vs Weight of students

Use:
Finding correlation between variables

3) Box Plot

  • Displays distribution using quartiles
  • Shows median, minimum, maximum, and outliers

Example:
Exam marks distribution of students

Conclusion:
Visualization helps in quick understanding, comparison, and decision-making.


Q3. Mean, Median, and Mode Explained

[8 Marks]

Answer:

Mean

Average of all values

Mean = ∑xn

Median

Middle value after arranging data in order.

Mode

Value that occurs most frequently.

Example:
Data: 10, 20, 20, 30, 40

  • Mean = (10+20+20+30+40)/5 = 24
  • Median = 20
  • Mode = 20

Conclusion:
Mean is affected by extreme values, while Median and Mode represent central tendency better in skewed data.


Q4. Standard Deviation: Measuring Data Spread

[8 Marks]

Answer:
Standard Deviation (SD) measures dispersion or spread of data around the mean.

Formula:

σ = √∑(x-−x¯)2n

Key Points:

  • Low SD → data close to mean
  • High SD → data widely spread
  • Used in risk analysis and quality control

Example:
Exam scores with high SD indicate varied student performance.

Conclusion:
Standard deviation helps in understanding data variability and consistency.


Q5. Univariate vs. Bivariate Analysis Differentiation

[8 Marks]

BasisUnivariateBivariate
VariablesOneTwo
PurposeDescribe dataRelationship analysis
ExamplesMean, MedianCorrelation
GraphsHistogramScatter plot

Conclusion:
Univariate focuses on single variable, while bivariate studies relationship between two variables.


Q6. Histogram and Statistical Thinking

[9 Marks]

Answer:

Histogram

A histogram shows frequency distribution using bars.

Example:
Marks distribution of students

Statistical Thinking

Involves:

  • Understanding variation
  • Using data for decision-making
  • Interpreting patterns and trends

Example:
Using sales data to improve business strategy.

Conclusion:
Histogram supports statistical thinking by visually showing data behavior.


Q7. Normal Distribution Characteristics

[8 Marks]

Answer:
Normal distribution is a symmetric, bell-shaped curve where:

  • Mean = Median = Mode
  • Most values lie near the mean

Properties:

  • 68% data within 1 SD
  • 95% within 2 SD
  • 99.7% within 3 SD

Example:
Heights of students in a class.

Conclusion:
Normal distribution is widely used in statistics and machine learning.


Q8. Poisson Distribution for Event Probability

[8 Marks]

Answer:
Poisson distribution models the number of times an event occurs in a fixed interval.

Conditions:

  • Events are independent
  • Constant average rate

Example:
Number of calls received per hour in a call center.

Conclusion:
Used for rare event probability analysis.