Essential Statistical Concepts for Data Analysis
Fundamentals of Business Statistics
Statistics Definition
The field concerning the collection, analysis, interpretation, and presentation of data used for the decision-making process in the business area.
Descriptive Statistics
Involves data collection methods, description, and summary data visualization. Focuses on the data as they are.
Inferential Statistics
The generation of models, inferences, and predictions associated with the phenomenon in question (predicting how the variable will behave).
Populations and Samples
Population (N)
The total set of objects or individuals of interest under study that share one common characteristic.
Finite Population
A population where the number of elements is finite.
Infinite Population
A population where the number of elements is infinite.
Sample
A subset of the population.
Random Sample
A sample selected when elements are chosen at random.
Sampling Methods
Stratified Sampling
The population is split into strata or subgroups according to the characteristic being studied.
Systematic Sampling
Establishes a pattern or criterion for selecting the sample elements.
Variables and Data Types
Variable
Any characteristic observed in a population that can take values. Variables are classified from the point of view of measurement:
Quantitative Variables
Variables that take numeric values. These may be:
- Continuous: Takes any value within a given range (e.g., weight, height, salary, values with decimals).
- Discrete: Takes only integer values (e.g., number of students, number of children).
Qualitative Variables
Variables that describe qualities or categories. These can be:
- Nominal: Categories without inherent order or rank (e.g., civil status).
- Ordinal: Categories that present order and rank (e.g., days of the week, satisfaction levels).
Independent Variable
The variable evaluated for its ability to influence or affect other variables.
Dependent Variable
The changes suffered by the subject as a result of the manipulation of the independent variable.
Data Representation and Tabulation
Data Entry Table
A simple table where only the data obtained from the research appear. It is used when no further information is needed. We can calculate the mean and median from this data.
Frequency Table
Created through tabulation and grouping, where each variable value is associated with numbers representing the count (number of times) it occurs.
Double-Entry Tables
Tables used to display data concerning two variables simultaneously (bivariate data).
Diagrams and Charts
A pattern formed by lines or shapes used to represent data visually.
Bar Diagram
Used to represent qualitative attributes and discrete quantitative variables.
- Simple Bar Diagram: Graphical representation of unique facts or a single series.
- Multiple Bar Diagram: Advisable for comparing several statistical series.
- Composite Bar Diagram: The bars of the second series are placed on top of the bars of the first series, respectively.
Bivariate Analysis Diagram
A diagram used for analysis that operates with data on two variables, aiming to discover and study their statistical properties.
Measures of Central Tendency
Arithmetic Mean (Average)
The weighted average value of the data series.
Simple Arithmetic Mean
The sum of all elements of the series divided by the total number of elements.
Weighted Arithmetic Mean
The sum of the product of each element of the series and its respective frequency, divided by the total number of elements in the series.
Median
The value of the data located exactly in the center of the sample when ordered.
Mode
In a series of numbers, the value that occurs most often; the value that is repeated the maximum number of times, making it the most common value.
Measures of Dispersion (Variability)
Defining Dispersion Measures
Measures that describe the distribution of the values in the series, examining whether these values are more or less concentrated or more or less dispersed.
Mean Deviation
Usually focuses on the extent of deviation about the mean (average).
Variance
Measures the average squared distance between the values of the series and the mean.
Standard Deviation
The square root of the variance. It is also defined as the square root of the mean square of deviations from the mean distribution.
