Data Analysis Tasks for Sales and Customer Metrics
Posted on Dec 9, 2025 in Mathematics
Data Analysis Tasks
1. Data Visualization Requirements
1a. Quarterly Sales Distribution
- Create a histogram to illustrate the distribution of the variable Quarterly Sales ($).
- Use the provided bins to create the histogram.
- Use proper titles and remove gaps between bars on the histogram.
1b. Customer Quality Rating Distribution
- Create a pie chart that illustrates the distribution of Customers by Quality Rating.
- Include the category names and percentage labels on the slices of the Pie Chart.
2. Simple Linear Regression: Sales vs. Training Hours
2a. Scatter Plot and Interpretation
- Create a scatter plot that illustrates how Quarterly Sales ($) (Y-axis) is influenced by Training Hours (X-axis).
- Include proper axis titles on your graph.
- Include the regression equation and R-squared value on your graph.
- Identify the Response (Dependent) and Explanatory (Independent) variable for this model.
- What is the interpretation of the SLOPE of the regression line?
2b. Regression Model Calculations
- Create a Simple Linear Regression model to predict Quarterly Sales ($) based on Training Hours.
- Include the Regression output on this sheet (set output range to cell B45).
- Enter the Regression Equation:
- What is the value of the SLOPE of the regression model?
- Enter the value of the INTERCEPT of the regression equation.
- What proportion of variability in Quarterly Sales ($) is explained by the model?
- What is the correlation between Quarterly Sales (S) and Training Hours?
- What is the predicted Quarterly Sales ($) for a customer that has Training Hours = 41?
- What is the residual for the customer in the previous question if his quarterly sales (S) were = $10,643?
- Determine the P value and conclusion for the coefficient of Training Hours using significance level 1%.
3. Contingency Table and Probability
3. Frequency Table and Chart
- Create a Frequency Table and a Double Column chart to illustrate the Distribution of Quality Rating (Rows) by Gender (Column).
- DO NOT INCLUDE TOTALS on your graph (there should be no bars for the total column or row).
Probability Questions
- If a customer with Good rating is randomly selected, what is the probability the customer gender is Male?
- What percentage of customers in the sample have Excellent rating?
4. Hypothesis Test: Quality Rating and Gender Association
- Conduct an appropriate hypothesis test to determine if there is evidence of association between Quality Rating and Gender.
- Select the correct conclusion based on your results. Use significance level $\alpha = 5\%$.
5. Analysis by Quality Rating
5a. Summary Statistics and Visualization
- Use a Pivot Table to determine the Average Quarterly Sales ($) and Number of employees for different QualityRating.
- Create SIDE BY SIDE BOXPLOTS to illustrate the results.
5b. Hypothesis Test for Mean Sales
- Conduct an appropriate hypothesis test to determine if there is evidence of a difference in the average Quarterly Sales ($) across Quality Rating.
- State your conclusion in plain English. Use significance level $\alpha = 5\%$.
6. Age Variable Analysis
6a. Five Number Summary and Box Plot
- Use EXCEL FUNCTIONS (Not templates) to determine the five number Summary for the variable AGE.
- Create a BOX PLOT for the variable AGE to illustrate the results.
6b. Conditional Calculations for Gender
- Use
COUNTIF, AVERAGEIF, and SUMIF to determine the following values for Female and Male account customers.
7. Rebate Calculation
- Suppose the company wants to issue Rebates to customers with Excellent or Good Quality Rating.
- Customers with Excellent Quality Rating will receive Rebates equal to 12% of their QuarterlySales plus $500 coupons for future purchases.
- Customers with Good Quality Rating will receive Rebates equal to 5% of their QuarterlySales plus $250 coupons for future purchases.
- Use appropriate excel formulas and functions to determine the Total Amount of money needed to cover the rebates and coupons and the number of customers who will receive it.
8. Multiple Linear Regression: Sales vs. Income and Training
- Develop a Multiple Linear Regression model to predict Quarterly Sales ($) using Annual Income ($) and Training Hours as independent variables.
- Include the Regression output on this sheet (set output range to cell C28).
- Enter the Regression Equation:
- Enter your answers on the HIGHLIGHTED Cells for correct grading.
- What is the value of the intercept of the regression equation?
- What is the estimated regression coefficient for Annual Income ($)?
- What is the estimated regression coefficient for Training Hours?
- What is the Correlation between Annual Income ($) and Quarterly Sales ($)?
- What is the Correlation between Training Hours and Quarterly Sales ($)?
- What proportion of variability in Quarterly Sales ($) is explained by the regression model?
- What is the value of the F statistic to test the overall regression model?
- What is the predicted Quarterly Sales ($) for a customer that has Annual Income ($ = 67,686 and Training Hours = 40?
- What is the residual for the customer in the previous question if his/her Quarterly Sales ($) = $10,276?
- Determine the P value and conclusion for the coefficient of Annual Income ($) Using significance level 10%.
- Determine the P value and conclusion for the coefficient of Training Hours Using significance level 10%.
9. Goodness-of-Fit Test
- According to a recently published report, 60% of customers have Excellent Quality Rating, 19% had Good Quality Rating, and 21% had Poor Quality Rating.
- Use an appropriate statistical procedure (e.g., Chi-Square) to determine if this sample is consistent with that report? Use significance level $\alpha = 10\%$.
10. Multiple Linear Regression: Sales vs. Gender, Income, and Training
- Develop a Multiple Linear Regression model to predict Quarterly Sales ($) using Gender, Annual Income ($), and Training Hours as independent variables.
- Set a Dummy variable for Gender and code Male=0 and Female=1.
- Include the Regression output on this sheet (set output range to cell J34).
- The Regression Equation:
- What is the value of the intercept of the regression equation?
- What is the estimated regression coefficient for Gender?
- What is the estimated regression coefficient for Annual Income ($)?
- What is the estimated regression coefficient for Training Hours?
- What is the Standard Error of the Regression?
- What proportion of variability in Quarterly Sales (S) is explained by the regression model?
- What is the value of the F statistic to test the overall regression model?
- What is the predicted Quarterly Sales ($) for a MALE customer that has Annual Income ($ = 64,368 and Training Hours = 40?
- What is the residual for the customer in the previous question if Quarterly Sales ($) = $10,328?
- What is the predicted Quarterly Sales ($) for a FEMALE customer that has Annual Income ($) = 64,368 and Training Hours = 40?
- What is the residual for the customer in the previous question if Quarterly Sales ($) = $10,328?
- Determine the P value and conclusion for the coefficient of Gender Using significance level 5%.
- Determine the P value and conclusion for the coefficient of Annual Income ($) Using significance level 5%.
- Determine the P value and conclusion for the coefficient of Training Hours Using significance level 5%.