Python Data Essentials: Pandas, NumPy, and String Methods
Introduction to Pandas
Pandas is a powerful and flexible Python library used for data manipulation, analysis, and cleaning. It is suitable for handling different kinds of data, such as:
- Tabular data with heterogeneous columns (different types of data in a single dataset).
- Ordered & unordered time-series data (data arranged based on time or random order).
- Arbitrary matrix data with row & column labels.
- Unlabeled data, making it useful for raw statistical data processing.
Essential Pandas Operations
Pandas offers numerous functions for data wrangling. Here are key operations:
- Slicing DataFrames: Allows extracting specific rows or columns from a dataset. A DataFrame is a two-dimensional data structure in Pandas.
- Merging & Joining: Merging combines two DataFrames into one, allowing you to specify a common column. Joining is similar but works based on indexes instead of columns.
- Concatenation: Stacks multiple DataFrames together vertically (rows) or horizontally (columns).
Modifying DataFrames
- Changing Index: You can set or reset the index using
.set_index()
or.reset_index()
methods. - Renaming Columns: Change column names using
.rename()
.
Data Munging
Data munging refers to transforming data from one format to another.
Python NumPy Fundamentals
NumPy (Numerical Python) is a core Python library used for scientific computing.
- It provides an N-dimensional array object, which helps with efficient data storage and manipulation.
- NumPy supports linear algebra, random number generation, and integration with other languages (C, C++).
NumPy Arrays
Single-Dimensional Array
Stores elements in a sequence. For example:
import numpy as np
a = np.array([1,2,3])
print(a)
Multi-Dimensional Array
Allows complex matrix operations. For example:
a = np.array([(1,2,3), (4,5,6)])
print(a)
Why Use NumPy Instead of Python Lists?
- Less Memory Usage: NumPy arrays take up significantly less space than lists.
- Faster Execution: Computations on NumPy arrays are much quicker.
- Convenience: Provides built-in functions for efficient data manipulation.
Core NumPy Operations
- Find Dimension (
ndim
): Determines whether an array is single or multi-dimensional.a = np.array([(1,2,3), (4,5,6)]) print(a.ndim) # Output: 2
- Byte Size (
itemsize
): Displays the size of each element in memory.a = np.array([(1,2,3)]) print(a.itemsize) # Output: 4
- Data Type (
dtype
): Identifies the data type of elements in an array.a = np.array([(1,2,3)]) print(a.dtype) # Output: int32
- Array Size & Shape:
size
gives total elements, andshape
gives rows & columns.a = np.array([(1,2,3,4,5,6)]) print(a.size) # Output: 6 print(a.shape) # Output: (1,6)
- Reshape (
reshape
): Rearranges an array into a different row-column structure.a = np.array([(8,9,10), (11,12,13)]) a = a.reshape(3,2) print(a)
- Linspace (
linspace
): Generates evenly spaced values in a range.a = np.linspace(1,3,10) print(a)
- Finding Min/Max/Sum: Get statistical values from an array.
a = np.array([1,2,3]) print(a.min()) # Output: 1 print(a.max()) # Output: 3 print(a.sum()) # Output: 6
Introduction to Strings in Python
- A string is a sequence of characters enclosed in single (
'
), double ("
), or triple ("""
) quotes. - Strings are immutable, meaning they cannot be modified after creation.
- Python provides built-in string functions for easy manipulation.
Changing Case in Strings
str.upper()
: Converts all characters to uppercase.str.lower()
: Converts all characters to lowercase.
Example:
ss = "Softpro India"
print(ss.upper()) # Output: SOFTPRO INDIA
print(ss.lower()) # Output: softpro india
Joining, Splitting, and Replacing Strings
str.join()
: Joins strings using a separator.str.split()
: Splits a string into a list of words.str.replace()
: Replaces a substring with another value.
Boolean String Methods
- Used for validating input types (e.g., names should be alphabetic, postal codes should be numeric).
- Methods return
True
orFalse
based on string properties:str.isalnum()
: Checks if all characters are letters or numbers.str.isalpha()
: Checks if all characters are alphabetic.str.isnumeric()
: Checks if all characters are numeric.str.isspace()
: Checks if the string contains only whitespace.str.isupper()
: Checks if the string is uppercase.str.islower()
: Checks if the string is lowercase.
Checking Palindromes
- A palindrome reads the same forward and backward.
- Example:
python_string = input("Enter a string: ") reverse_string = "".join(reversed(python_string)) if python_string == reverse_string: print("String is palindrome") else: print("String is non-palindrome")
Generating Shortened Names
Converts a full name into an abbreviated format.
name = input("Enter your full name: ")
shortname = name.split(" ")
print("Your short name:", end="")
for n in range(len(shortname)-1):
print(shortname[n][0] + ".", end="")
print(shortname[-1])
Replacing Words in a Sentence
Searches and replaces a word in a sentence.
sentence = input("Enter a sentence: ")
fw = input("Find what? ")
rw = input("Replace with: ")
print("Modified sentence: " + sentence.replace(fw, rw))
Number Format Conversions
Converts a decimal number into binary, octal, and hexadecimal.
n = int(input("Enter a number: "))
print("Binary format:", bin(n).replace("0b", ""))
print("Octal format:", oct(n).replace("0o", ""))
print("Hexadecimal format:", hex(n).replace("0x", ""))