Natural Language Processing Core Concepts and Techniques
Text Preprocessing and Feature Space Reduction
Which of the following text preprocessing steps can reduce the dimensionality of a bag-of-words feature space?
- A. Converting all text to lowercase
- B. Removing common stop words (e.g., “the”, “and”, “of”)
- D. Stemming or lemmatizing words (e.g., “running” → “run”)
Limitations of Bag-of-Words Representation
Which of the following are limitations of the bag-of-words (unigram) text representation?
- A. It ignores the order of words in the text
Integrating SQL, Understanding NoSQL, and MongoDB Fundamentals
Database Programming: Integrating SQL with Host Languages
Introduction to SQL Integration
This section covers how SQL is used in host languages like C and Java, utilizing techniques such as Embedded SQL, Dynamic SQL, JDBC, and SQLJ. It demonstrates essential steps like connecting to a database, declaring host variables, retrieving data using cursors, and executing dynamic queries.
Embedded SQL (C Example)
Embedded SQL allows direct integration of SQL statements within a host language program (e.g.,
Read MoreMachine Learning Fundamentals: Algorithms and PyTorch Implementation
Machine Learning Core Concepts
Basic Mathematical Information
Matrix Multiplication
- If Matrix A has size (m x n) and Matrix B has size (n x p), the resulting product AB has size (m x p).
- The number of columns in A must equal the number of rows in B.
- Calculation involves multiplying each row of A by each column of B.
Finding Log Base 2 (n)
Key Machine Learning Definitions
- Supervised Learning: Models learn from labeled data to approximate a target function (hypothesis function).
- Classification: Goal is to
Machine Learning & AI Foundations: Definitions, Lifecycle, and Tools
CRISP-ML(Q) Project Lifecycle
- Definition: A 6-phase framework for managing machine learning projects, with a focus on quality at each step.
- Phases and Examples:
Business & Data Understanding
- Definition: Define the business problem and assess available data.
- Example: Goal: Reduce customer churn by 15%. Data: Purchase history, support tickets.
Data Preparation
- Definition: Clean, organize, and transform raw data for modeling.
- Example: Create “age” from “date of birth”; unify country codes like “USA” and
8085 Microprocessor Architecture & Assembly Language Fundamentals
Bus Organization in 8085 Microprocessors
A bus in the 8085 microprocessor is a group of wires used for communication between different components. There are three main types:
- Data Bus: Carries actual data, like a delivery van.
- Address Bus: Carries the memory address to access data, like a GPS.
- Control Bus: Carries control signals (e.g., read/write instructions). These signals coordinate data movement between the CPU, memory, and I/O devices.
Memory Addressing & Mapping Fundamentals
Memory addressing
Read MoreNLP Foundations: From Text Processing to Large Language Models
Week 1: Working with Words
Tokenization:
Splitting text into discrete units (tokens), typically words or punctuation . Techniques vary (simple split on whitespace vs. Advanced tokenizers); challenges include handling punctuation, contractions, multi-
word names, and different languages (e.G., Chinese has no spaces). Good tokenization is foundational for all NLP tasks.Bag-of-Words (BoW):
Representing a document by the counts of each word in a predefined vocabulary, ignoring order . The vocabulary is
