Natural Language Processing Core Concepts and Techniques

Text Preprocessing and Feature Space Reduction

Which of the following text preprocessing steps can reduce the dimensionality of a bag-of-words feature space?

  • A. Converting all text to lowercase
  • B. Removing common stop words (e.g., “the”, “and”, “of”)
  • D. Stemming or lemmatizing words (e.g., “running” → “run”)

Limitations of Bag-of-Words Representation

Which of the following are limitations of the bag-of-words (unigram) text representation?

  • A. It ignores the order of words in the text
Read More

Integrating SQL, Understanding NoSQL, and MongoDB Fundamentals

Database Programming: Integrating SQL with Host Languages

Introduction to SQL Integration

This section covers how SQL is used in host languages like C and Java, utilizing techniques such as Embedded SQL, Dynamic SQL, JDBC, and SQLJ. It demonstrates essential steps like connecting to a database, declaring host variables, retrieving data using cursors, and executing dynamic queries.

Embedded SQL (C Example)

Embedded SQL allows direct integration of SQL statements within a host language program (e.g.,

Read More

Machine Learning Fundamentals: Algorithms and PyTorch Implementation

Machine Learning Core Concepts

Basic Mathematical Information

Matrix Multiplication

  • If Matrix A has size (m x n) and Matrix B has size (n x p), the resulting product AB has size (m x p).
  • The number of columns in A must equal the number of rows in B.
  • Calculation involves multiplying each row of A by each column of B.

Finding Log Base 2 (n)

AD_4nXfUiHots-XMWpr9xYeeVyJl2Bsz_zyCmDxGn4of4bp44I5m3IinDe4myLlS8s4E88V-N8NCEe9LAAM7OBeLnzfw36KoraNOeFysxq8nmm_uZmRjWSt34OdJI9TrFfBXkMR4gGMG?key=M4vzI9Gg7pcCKGgSZtX7d-xa

Key Machine Learning Definitions

  • Supervised Learning: Models learn from labeled data to approximate a target function (hypothesis function).
  • Classification: Goal is to
Read More

Machine Learning & AI Foundations: Definitions, Lifecycle, and Tools

CRISP-ML(Q) Project Lifecycle

  • Definition: A 6-phase framework for managing machine learning projects, with a focus on quality at each step.
  • Phases and Examples:
    1. Business & Data Understanding

      • Definition: Define the business problem and assess available data.
      • Example: Goal: Reduce customer churn by 15%. Data: Purchase history, support tickets.
    2. Data Preparation

      • Definition: Clean, organize, and transform raw data for modeling.
      • Example: Create “age” from “date of birth”; unify country codes like “USA” and
Read More

8085 Microprocessor Architecture & Assembly Language Fundamentals

Bus Organization in 8085 Microprocessors

A bus in the 8085 microprocessor is a group of wires used for communication between different components. There are three main types:

  • Data Bus: Carries actual data, like a delivery van.
  • Address Bus: Carries the memory address to access data, like a GPS.
  • Control Bus: Carries control signals (e.g., read/write instructions). These signals coordinate data movement between the CPU, memory, and I/O devices.

Memory Addressing & Mapping Fundamentals

Memory addressing

Read More

NLP Foundations: From Text Processing to Large Language Models

Week 1: Working with Words

  • Tokenization:


    Splitting text into discrete units (tokens), typically words or punctuation . Techniques vary (simple split on whitespace vs. Advanced tokenizers); challenges include handling punctuation, contractions, multi-
    word names, and different languages (e.G., Chinese has no spaces). Good tokenization is foundational for all NLP tasks.
  • Bag-of-Words (BoW):


    Representing a document by the counts of each word in a predefined vocabulary, ignoring order . The vocabulary is
Read More