Excel Import, Cleaning, Transformation & Visualization

Importing and Exporting Data

In data science, importing is the act of bringing external data into Excel for analysis, while exporting is saving your Excel data into a format that other programs can read.

Common File Formats

Understanding these formats helps you choose the right one for your task:

  • Excel Workbook (.xlsx): A workbook that stores data, formulas, formatting, charts, and multiple sheets. Best for your working file.
  • CSV (Comma-Separated Values): A plain text file where a comma separates each piece of data. It is the universal language of data because almost any software (Python, SQL, Tableau) can read it.
  • Note: CSVs do not save formulas or formatting.
  • TXT (Tab-Delimited): Similar to CSV, but uses tabs instead of commas to separate data. This is often used for data that might contain commas within the text (like addresses).

How to Import Data

While you can just double-click a CSV file to open it, using the Get Data feature is better because it prevents Excel from accidentally changing your data (for example, turning a long ID number into scientific notation).

Using Power Query (Recommended)

  1. Go to the Data tab.
  2. Click Get Data > From File > From Text/CSV.
  3. Select your file and click Import.
  4. A preview window will appear. Here you can check:
    • Delimiter: Is it a comma, semicolon, or tab?
    • Data Type Detection: Ensure Excel is not misidentifying your columns.
  5. Click Load to bring it into your sheet or Transform Data to clean it first.

How to Export Data

Exporting allows you to share your results with people who might not use Excel or to upload data into a database.

Save As Method

  1. Go to File > Save As (or press F12).
  2. Choose your destination folder.
  3. In the Save as type dropdown, select:
    • CSV (Comma delimited) (*.csv)
    • Text (Tab delimited) (*.txt)
  4. Click Save.
  5. Warning: Excel will warn you that “some features might be lost.” This is normal; it just means your colors, bold text, and formulas won’t be saved in a CSV file.

Key Differences: Excel vs. CSV

FeatureExcel (.xlsx)CSV (.csv)
FormattingSaves colors, fonts, bordersPlain text only
FormulasSaves active formulasSaves only the result of the formula
Multiple SheetsSupportedOnly the active sheet is saved
File SizeLarger (binary or structured workbook)Very small (text)
CompatibilityMostly Microsoft/GoogleUniversal (works with everything)

Data Cleaning and Preparation

Data cleaning is arguably the most important part of data science. In fact, most data scientists spend about 80% of their time cleaning data and only 20% analyzing it. Dirty data leads to “garbage in, garbage out” results.

Handling Missing Values

Missing data occurs when no value is stored for a variable in an observation. In Excel, these appear as empty cells or #N/A errors.

Strategies to fix it:

  • Deletion: If a row has too many missing values, you might delete the entire row. However, use this cautiously to avoid losing valuable information.
  • Imputation (Filling): Replacing missing values with a logical substitute:
    • Mean/Median: Use the average of the column (best for numerical data).
    • Mode: Use the most frequent value (best for categorical data like City or Color).
    • Constant: Fill with Unknown or 0.

How to do it: Select your data > Home tab > Find & Select > Go To Special… > Blanks. This highlights all empty cells so you can fill them at once.

Removing Duplicates

Duplicate records can skew your results (for example, counting a single sale twice).

How to fix it:

  1. Select your data range.
  2. Go to the Data tab.
  3. Click Remove Duplicates.
  4. Choose which columns to check. If you select all columns, it will only remove rows that are 100% identical.

Correcting Structural Errors

Structural errors are typos, inconsistent capitalization, or extra spaces that make the same data look different to a computer (for example, New York, new york, and New York ).

Key Excel Functions for Cleaning

  • =TRIM(text): Removes extra spaces from a cell except for single spaces between words.
  • =PROPER(text): Capitalizes the first letter of every word (fixes names and cities).
  • =CLEAN(text): Removes non-printable characters often found in data exported from old databases.

Handling Outliers and Data Errors

Errors are values that are technically impossible (for example, a Birth Year of 2099) or outliers that are so extreme they distort the average.

How to identify and fix them:

  • Data Validation: Set rules to prevent errors from being entered in the first place (for example, Age must be between 0 and 120).
  • Sorting: Sort your data from smallest to largest. If the first or last few rows look impossible, they are likely errors.
  • Find & Replace (Ctrl + H): Quickly swap incorrect terms across the entire sheet (for example, replace USA with United States).

Power Query for Cleaning

For large datasets, use Power Query (Data > Get Data). It records your cleaning steps like a macro so that when you import new data next month, you can hit Refresh and all the cleaning happens automatically.

Data Transformation in Excel

Data transformation involves changing the structure, format, or values of your data to make it more useful for analysis. In Excel, this often means splitting combined data, joining separate pieces of information, or controlling what can be entered into a cell.

Text-to-Columns: Splitting Data

This feature is used when multiple pieces of information are trapped in a single cell (for example, John Doe in one cell when you need First Name and Last Name separately).

How to use it:

  1. Highlight the column you want to split.
  2. Go to the Data tab > Text to Columns.
  3. Choose Delimited (if there is a character like a comma or space separating the data) or Fixed Width (if the data is always the same length).
  4. Select your delimiter (for example, Space or Comma) and click Finish.

Concatenation: Joining Data

Concatenation is the opposite of Text-to-Columns; it combines data from two or more cells into one.

Method A: The Ampersand (&) Operator

This is the quickest way to join cells.

Formula: =A2 & " " & B2

Result: If A2 is Data and B2 is Science, the result is Data Science. (The ” ” adds a space between the words.)

Method B: The TEXTJOIN Function (Modern Excel)

This is better for joining long lists because it handles delimiters and empty cells automatically.

Formula: =TEXTJOIN(" ", TRUE, A2:C2)

Result: Joins everything in the range A2 to C2 with a space in between, skipping any empty cells.

Data Validation: Preventing Errors

Data validation allows you to set rules on what can be typed into a cell. This is proactive data cleaning—it stops dirty data before it enters your system.

Common Uses:

  • Dropdown Lists: Create a predefined list of options (for example, Paid, Pending, Overdue) to ensure consistent spelling.
  • Date/Number Restrictions: Ensure a Start Date is not in the future or that Age is between 0 and 120.
  • Input Messages: Show a small pop-up note when a user clicks the cell, explaining what they should type.

How to set it up: Select the cells > Data tab > Data Validation > Choose your criteria under the Settings tab.

Flash Fill: The Smart Transformation

Flash Fill is an AI-powered feature that senses patterns. If you start typing the split or joined version of your data in the column next to your raw data, Excel will recognize the pattern.

  • Shortcut: Press Ctrl + E to have Excel automatically fill the rest of the column based on your example.

Summary Table

GoalTool/FunctionBest For…
SplitText-to-ColumnsTurning “City, State” into two columns.
Join& or TEXTJOINCreating full names or addresses.
RestrictData ValidationCreating dropdown menus or enforcing rules.
AutomateFlash FillQuick one-time patterns without formulas.

Data Visualization and Charts

Data visualization is the storytelling phase of data science. While tables show you the numbers, charts show you the trends, patterns, and outliers that might otherwise be missed.

Choosing the Right Chart

The most important step in visualization is matching your data to the correct chart type.

Chart TypeBest Used For…Example
Bar / ColumnComparing categories or quantitiesSales by region; Top 5 products
Line ChartShowing trends over timeMonthly revenue; Temperature changes
Pie ChartShowing parts of a whole (proportions)Market share; Budget breakdown

How to Create a Chart

In Excel, creating a chart is a three-step process:

  1. Select Your Data: Highlight the cells including your headers (for example, Month and Sales).
  2. Insert the Chart: Go to the Insert tab and select the icon for the chart you want.
  3. Refine: Click the Recommended Charts button if you aren’t sure which style fits best.

Customizing Your Chart

Once a chart is created, Excel provides three icons on the top-right of the chart to help you customize it.

Chart Elements (+ Icon)

This is where you add clarity to your story.

  • Axis Titles: Always label your X and Y axes (for example, Months and USD).
  • Data Labels: Display the exact numbers on top of bars or lines.
  • Trendlines: For line or bar charts, add a line that shows the general direction of the data.

Chart Styles (Paintbrush Icon)

  • Color Palettes: Change the color scheme to match your brand or to highlight specific points.
  • Pre-set Styles: Quickly switch between a minimalist look or a high-contrast “dark mode” style.

Chart Filters (Funnel Icon)

Temporarily hide data series (for example, viewing only Product A and Product B without deleting Product C from the chart).

Visualization Best Practices

To ensure your charts are effective and not misleading:

  • Start the Y-Axis at Zero: Starting at a higher number can exaggerate small differences (common in bar charts).
  • Don’t Overcrowd: A pie chart with 20 slices is impossible to read. Stick to 5–7 slices maximum.
  • Remove Chart Junk: Delete unnecessary gridlines or 3D effects that distract from the data.
  • Sort Bar Charts: Unless you are showing a timeline, sort your bars from largest to smallest to make comparisons easier.

Pro Tip: Combination Charts

Sometimes one chart isn’t enough. You can create a combo chart (for example, a bar chart for Sales and a line chart for Profit Margin on the same graph).

How: Go to Insert > Combo Chart > Clustered Column – Line on Secondary Axis.