Essential Statistical Concepts for Data Analysis

Fundamentals of Business Statistics

Statistics Definition

The field concerning the collection, analysis, interpretation, and presentation of data used for the decision-making process in the business area.

Descriptive Statistics

Involves data collection methods, description, and summary data visualization. Focuses on the data as they are.

Inferential Statistics

The generation of models, inferences, and predictions associated with the phenomenon in question (predicting how the variable will behave).

Populations and Samples

Population (N)

The total set of objects or individuals of interest under study that share one common characteristic.

Finite Population

A population where the number of elements is finite.

Infinite Population

A population where the number of elements is infinite.

Sample

A subset of the population.

Random Sample

A sample selected when elements are chosen at random.

Sampling Methods

Stratified Sampling

The population is split into strata or subgroups according to the characteristic being studied.

Systematic Sampling

Establishes a pattern or criterion for selecting the sample elements.

Variables and Data Types

Variable

Any characteristic observed in a population that can take values. Variables are classified from the point of view of measurement:

Quantitative Variables

Variables that take numeric values. These may be:

  • Continuous: Takes any value within a given range (e.g., weight, height, salary, values with decimals).
  • Discrete: Takes only integer values (e.g., number of students, number of children).

Qualitative Variables

Variables that describe qualities or categories. These can be:

  • Nominal: Categories without inherent order or rank (e.g., civil status).
  • Ordinal: Categories that present order and rank (e.g., days of the week, satisfaction levels).

Independent Variable

The variable evaluated for its ability to influence or affect other variables.

Dependent Variable

The changes suffered by the subject as a result of the manipulation of the independent variable.

Data Representation and Tabulation

Data Entry Table

A simple table where only the data obtained from the research appear. It is used when no further information is needed. We can calculate the mean and median from this data.

Frequency Table

Created through tabulation and grouping, where each variable value is associated with numbers representing the count (number of times) it occurs.

Double-Entry Tables

Tables used to display data concerning two variables simultaneously (bivariate data).

Diagrams and Charts

A pattern formed by lines or shapes used to represent data visually.

Bar Diagram

Used to represent qualitative attributes and discrete quantitative variables.

  • Simple Bar Diagram: Graphical representation of unique facts or a single series.
  • Multiple Bar Diagram: Advisable for comparing several statistical series.
  • Composite Bar Diagram: The bars of the second series are placed on top of the bars of the first series, respectively.

Bivariate Analysis Diagram

A diagram used for analysis that operates with data on two variables, aiming to discover and study their statistical properties.

Measures of Central Tendency

Arithmetic Mean (Average)

The weighted average value of the data series.

Simple Arithmetic Mean

The sum of all elements of the series divided by the total number of elements.

Weighted Arithmetic Mean

The sum of the product of each element of the series and its respective frequency, divided by the total number of elements in the series.

Median

The value of the data located exactly in the center of the sample when ordered.

Mode

In a series of numbers, the value that occurs most often; the value that is repeated the maximum number of times, making it the most common value.

Measures of Dispersion (Variability)

Defining Dispersion Measures

Measures that describe the distribution of the values in the series, examining whether these values are more or less concentrated or more or less dispersed.

Mean Deviation

Usually focuses on the extent of deviation about the mean (average).

Variance

Measures the average squared distance between the values of the series and the mean.

Standard Deviation

The square root of the variance. It is also defined as the square root of the mean square of deviations from the mean distribution.