Python Data Essentials: Pandas, NumPy, and String Methods

Introduction to Pandas

Pandas is a powerful and flexible Python library used for data manipulation, analysis, and cleaning. It is suitable for handling different kinds of data, such as:

  • Tabular data with heterogeneous columns (different types of data in a single dataset).
  • Ordered & unordered time-series data (data arranged based on time or random order).
  • Arbitrary matrix data with row & column labels.
  • Unlabeled data, making it useful for raw statistical data processing.

Essential Pandas Operations

Pandas offers numerous functions for data wrangling. Here are key operations:

  • Slicing DataFrames: Allows extracting specific rows or columns from a dataset. A DataFrame is a two-dimensional data structure in Pandas.
  • Merging & Joining: Merging combines two DataFrames into one, allowing you to specify a common column. Joining is similar but works based on indexes instead of columns.
  • Concatenation: Stacks multiple DataFrames together vertically (rows) or horizontally (columns).

Modifying DataFrames

  • Changing Index: You can set or reset the index using .set_index() or .reset_index() methods.
  • Renaming Columns: Change column names using .rename().

Data Munging

Data munging refers to transforming data from one format to another.

Python NumPy Fundamentals

NumPy (Numerical Python) is a core Python library used for scientific computing.

  • It provides an N-dimensional array object, which helps with efficient data storage and manipulation.
  • NumPy supports linear algebra, random number generation, and integration with other languages (C, C++).

NumPy Arrays

Single-Dimensional Array

Stores elements in a sequence. For example:

import numpy as np
a = np.array([1,2,3])
print(a)

Multi-Dimensional Array

Allows complex matrix operations. For example:

a = np.array([(1,2,3), (4,5,6)])
print(a)

Why Use NumPy Instead of Python Lists?

  • Less Memory Usage: NumPy arrays take up significantly less space than lists.
  • Faster Execution: Computations on NumPy arrays are much quicker.
  • Convenience: Provides built-in functions for efficient data manipulation.

Core NumPy Operations

  • Find Dimension (ndim): Determines whether an array is single or multi-dimensional.
    a = np.array([(1,2,3), (4,5,6)])
    print(a.ndim) # Output: 2
  • Byte Size (itemsize): Displays the size of each element in memory.
    a = np.array([(1,2,3)])
    print(a.itemsize) # Output: 4
  • Data Type (dtype): Identifies the data type of elements in an array.
    a = np.array([(1,2,3)])
    print(a.dtype) # Output: int32
  • Array Size & Shape: size gives total elements, and shape gives rows & columns.
    a = np.array([(1,2,3,4,5,6)])
    print(a.size)  # Output: 6
    print(a.shape) # Output: (1,6)
  • Reshape (reshape): Rearranges an array into a different row-column structure.
    a = np.array([(8,9,10), (11,12,13)])
    a = a.reshape(3,2)
    print(a)
  • Linspace (linspace): Generates evenly spaced values in a range.
    a = np.linspace(1,3,10)
    print(a)
  • Finding Min/Max/Sum: Get statistical values from an array.
    a = np.array([1,2,3])
    print(a.min()) # Output: 1
    print(a.max()) # Output: 3
    print(a.sum()) # Output: 6

Introduction to Strings in Python

  • A string is a sequence of characters enclosed in single ('), double ("), or triple (""") quotes.
  • Strings are immutable, meaning they cannot be modified after creation.
  • Python provides built-in string functions for easy manipulation.

Changing Case in Strings

  • str.upper(): Converts all characters to uppercase.
  • str.lower(): Converts all characters to lowercase.

Example:

ss = "Softpro India"
print(ss.upper()) # Output: SOFTPRO INDIA
print(ss.lower()) # Output: softpro india

Joining, Splitting, and Replacing Strings

  • str.join(): Joins strings using a separator.
  • str.split(): Splits a string into a list of words.
  • str.replace(): Replaces a substring with another value.

Boolean String Methods

  • Used for validating input types (e.g., names should be alphabetic, postal codes should be numeric).
  • Methods return True or False based on string properties:
    • str.isalnum(): Checks if all characters are letters or numbers.
    • str.isalpha(): Checks if all characters are alphabetic.
    • str.isnumeric(): Checks if all characters are numeric.
    • str.isspace(): Checks if the string contains only whitespace.
    • str.isupper(): Checks if the string is uppercase.
    • str.islower(): Checks if the string is lowercase.

Checking Palindromes

  • A palindrome reads the same forward and backward.
  • Example:
    python_string = input("Enter a string: ")
    reverse_string = "".join(reversed(python_string))
    if python_string == reverse_string:
        print("String is palindrome")
    else:
        print("String is non-palindrome")

Generating Shortened Names

Converts a full name into an abbreviated format.

name = input("Enter your full name: ")
shortname = name.split(" ")
print("Your short name:", end="")
for n in range(len(shortname)-1):
    print(shortname[n][0] + ".", end="")
print(shortname[-1])

Replacing Words in a Sentence

Searches and replaces a word in a sentence.

sentence = input("Enter a sentence: ")
fw = input("Find what? ")
rw = input("Replace with: ")
print("Modified sentence: " + sentence.replace(fw, rw))

Number Format Conversions

Converts a decimal number into binary, octal, and hexadecimal.

n = int(input("Enter a number: "))
print("Binary format:", bin(n).replace("0b", ""))
print("Octal format:", oct(n).replace("0o", ""))
print("Hexadecimal format:", hex(n).replace("0x", ""))