Natural Language Processing (NLP): Definitions, Applications & Techniques

Posted on Feb 27, 2026 in Computer Engineering

NLP — Definition & Applications

Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that enables computers to understand, analyze, and generate human language (text or speech). NLP serves as a bridge between human language and computer language.

Applications of NLP (Explain Any Four)

1. Machine Translation

Machine translation automatically converts text from one language to another using NLP techniques.

Example: Google Translate
Use: Helps people communicate across languages.

2. Sentiment Analysis

Sentiment analysis identifies emotion or opinion in text.

Types: Positive / Negative / Neutral
Example: Product and movie reviews
Use: Customer feedback analysis and reputation monitoring.

3. Chatbots / Virtual Assistants

Chatbots and virtual assistants interact with users using natural language.

Example: Customer support chatbots, Alexa, Google Assistant
Use: 24×7 support; reduces manual work and improves response speed.

4. Speech Recognition

Speech recognition converts spoken words into text using NLP techniques.

Example: Voice typing, voice assistants
Use: Hands-free interaction and accessibility.

NLP in Real-World Applications

NLP is widely used in translation, chatbots, sentiment analysis, and many other applications, making human-computer interaction easier and more efficient.

Purpose of Text Normalization in NLP

Text normalization is a preprocessing step used to convert raw text into a standard, clean form so machines can process it reliably. Real-world text is often noisy and inconsistent; normalization reduces variation, removes unnecessary information, and improves the accuracy of NLP models.

Main purposes:

Make text uniform and consistent
Reduce vocabulary size
Improve performance of NLP algorithms
Remove noise from text data

Common Tasks in Text Normalization

1. Tokenization

Tokenization is the process of breaking text into smaller units called tokens.

Example: Sentence: “I love NLP” — Tokens: I | love | NLP

Purpose:

Helps in word-level analysis
Forms the base for further NLP tasks

2. Stop Word Removal

Stop words are common words that do not add much meaning to a sentence.

Examples: is, am, the, a, an, and, of

Purpose:

Reduces text size
Improves processing speed
Removes unnecessary words

3. Stemming / Lemmatization

Stemming and lemmatization reduce words to their base or root form.

Examples: running → run, studies → study

Difference (one line): Stemming trims words to a root form; lemmatization returns a meaningful root word.

Purpose:

Reduce vocabulary size
Improve matching of similar words

Text normalization plays an important role in NLP by cleaning and standardizing text data, which improves the accuracy and efficiency of NLP systems.

Minimum Edit Distance Algorithm

The Minimum Edit Distance (MED) algorithm computes the minimum number of operations required to convert one string into another.

Allowed operations: Insertion — add a character; Deletion — remove a character; Substitution — replace a character. Each operation has cost = 1.

Purpose of Minimum Edit Distance: Spell checking, autocorrect systems, machine translation, and measuring similarity between words.

Algorithm (Working of MED)

Create a matrix of size (m+1) × (n+1), where m = length of source word and n = length of target word.
Initialize the first row from 0 to n and the first column from 0 to m.
Fill the matrix using:

Deletion → top cell + 1
Insertion → left cell + 1
Substitution → diagonal cell + cost
cost = 0 (if characters same), cost = 1 (if characters different)

Final cell gives the Minimum Edit Distance.

Compute Minimum Edit Distance (Example)

Given words: Word 1: EXECUTION Word 2: INTENTION

Mini Edit Dist between EXECUTION and INTENTION = 5

Explain Vector Space Model of Information Retrieval

The Vector Space Model (VSM) is used in Information Retrieval (IR) to represent documents and queries as vectors in a multi-dimensional space. Each dimension corresponds to a term from the collection, and term values are typically computed using TF-IDF weighting.

Working of Vector Space Model

Convert all documents into vectors.
Each word becomes a dimension in vector space.
Assign weights to terms using TF-IDF.
Convert the user query into a vector.
Calculate similarity between document and query using cosine similarity.

Cosine similarity measures the angle between two vectors: a smaller angle indicates more similarity, and documents with higher similarity scores are ranked higher.

Advantages of Vector Space Model

Simple and easy to understand
Supports partial matching
Provides a ranked result list

Disadvantages of Vector Space Model

Ignores word order
Does not capture semantic meaning
High-dimensional vector space

The Vector Space Model is widely used in search engines because it provides efficient and ranked retrieval of documents based on similarity to the user query.

What Is a Language Model? N-Gram Language Model

Language Model: A probabilistic model in NLP that predicts the next word in a sequence based on previous words. It assigns probabilities to word sequences and helps machines understand how language is formed.

Need of a Language Model

Speech recognition
Machine translation
Text generation
Autocomplete systems

N-Gram Language Model

The N-gram model predicts a word using the previous (n-1) words. Here, N is the number of words considered.

Types of N-Gram Models

Unigram (N=1): considers one word
Bigram (N=2): considers the previous one word
Trigram (N=3): considers the previous two words

Working of N-Gram Model

The probability of a word depends only on the last (n-1) words and uses frequency counts from training data (e.g., P(word | previous words)).

Advantages of N-Gram Model

Simple and easy to implement
Fast computation
Works well for small datasets

Limitations of N-Gram Model

Requires large data for larger n
Suffers from data sparsity
Cannot capture long-term context

The N-gram model is a basic but important approach in NLP and serves as a foundation for more advanced language models.

Relevance Ranking Algorithm

Relevance ranking algorithms in Information Retrieval sort documents by their relevance to a user query. When a user submits a query, many documents may match; ranking algorithms compute similarity and display the most relevant documents first.

Purpose: Provide the most relevant results to users, sort documents by importance, and improve search accuracy and efficiency.

Working: Query preprocessing (tokenization, stop word removal) → represent documents using TF-IDF or Vector Space Model → compute similarity (e.g., cosine similarity) → rank documents by score.

Advantages: Produces ranked results and improves search accuracy; widely used in search engines.

Parse Tree Example (Top-Down Expansion)

Start with the start symbol: S → NP VP

Expand NP: NP → ART N (choose this because the sentence is The dogs cried)

So, NP → ART N → The dogs

Expand VP: VP → V → cried

Combine

S
├── NP
│   ├── ART → The
│   └── N → dogs
└── VP
    └── V → cried

Explanation: Start from S (sentence). Expand NP first (top-down). NP → ART + N matches “The dogs”. Then expand VP → V to match “cried”. Combine NP and VP to complete the sentence parse: “The dogs cried.”

Importance of Sentiment Analysis

Sentiment analysis identifies the emotion or opinion in text (positive, negative, or neutral) and helps extract insights from large volumes of text data.

1. Customer Feedback Analysis

Helps companies analyze customer reviews
Identifies customer satisfaction levels
Improves product and service quality

2. Business Decision Making

Supports better organizational decisions
Helps understand market response
Reduces business risk

3. Brand Reputation Monitoring

Tracks public opinion about a brand
Identifies negative feedback early
Helps maintain brand image

4. Social Media Analysis

Analyzes comments and posts
Understands public mood
Helps in trend analysis

5. Product and Movie Reviews

Classifies reviews as positive or negative
Helps users make decisions
Supports product improvements

Sentiment analysis is widely used across business, marketing, and social media to understand opinions and emotions in text data.

Text Classification

Text classification is a process in NLP where text documents are automatically assigned to predefined categories. It is used in spam detection, sentiment analysis, news classification, and email filtering. The goal is to make machines understand and correctly classify text.

How Text Classification Works

Text collection: Gather the text data to classify (emails, reviews, articles).
Text preprocessing: Clean text by lowercasing, removing punctuation, tokenizing, removing stop words, and applying stemming or lemmatization.
Feature extraction: Convert text into numeric features using Bag of Words (BoW), TF-IDF, or word embeddings.
Model training: Train a machine learning model (Naive Bayes, SVM, Logistic Regression, or deep learning) on labeled data.
Classification/prediction: Use the trained model to predict categories for new text.
Evaluation: Evaluate performance with accuracy, precision, recall, and F1-score.

Applying Logistic Regression for Text Classification

Data collection: Collect and split data into training and testing sets.
Text preprocessing: Lowercase, remove punctuation and numbers, remove stop words, tokenize, and apply stemming or lemmatization.
Feature extraction: Convert text into numeric features with BoW or TF-IDF.
Model training: Train a Logistic Regression model to learn the relationship between features and categories. Effective for binary and multi-class classification.
Prediction: Model outputs probabilities for each class and assigns the most likely label.
Evaluation: Measure accuracy, precision, recall, and F1-score; refine preprocessing or features if needed.

NLP Text Summarization

Text summarization converts a long document into a shorter summary while preserving the main meaning and important information. The primary aims are to save time, reduce text length, and help users quickly understand content.

Types of Text Summarization: There are two main approaches:

1. Extractive Text Summarization

Selects important sentences directly from the original text.
Does not create new sentences.
Works by finding sentences with high importance or score.
Common techniques: TF-IDF, TextRank.
Simple and reliable, though summaries can sometimes lack smoothness.

Example: Selecting three important sentences from a long paragraph.

2. Abstractive Text Summarization

Generates new sentences to express the core meaning.
Understands the content and then produces a concise summary.
Similar to how humans write summaries.
Advanced models used: Seq2Seq, Transformer models (T5, BART, Pegasus).
Produces more fluent and meaningful summaries but is more complex.

Text summarization is a useful NLP technique for handling large volumes of text efficiently.

Natural Language Processing (NLP): Definitions, Applications & Techniques

NLP — Definition & Applications

Applications of NLP (Explain Any Four)

1. Machine Translation

2. Sentiment Analysis

3. Chatbots / Virtual Assistants

4. Speech Recognition

NLP in Real-World Applications

Purpose of Text Normalization in NLP

Common Tasks in Text Normalization

1. Tokenization

2. Stop Word Removal

3. Stemming / Lemmatization

Minimum Edit Distance Algorithm

Algorithm (Working of MED)

Compute Minimum Edit Distance (Example)

Explain Vector Space Model of Information Retrieval

Working of Vector Space Model

Advantages of Vector Space Model

Disadvantages of Vector Space Model

What Is a Language Model? N-Gram Language Model

Need of a Language Model

N-Gram Language Model

Types of N-Gram Models

Working of N-Gram Model

Advantages of N-Gram Model

Limitations of N-Gram Model

Relevance Ranking Algorithm

Importance of Sentiment Analysis

1. Customer Feedback Analysis

2. Business Decision Making

3. Brand Reputation Monitoring

4. Social Media Analysis

5. Product and Movie Reviews

Text Classification

How Text Classification Works

Applying Logistic Regression for Text Classification

NLP Text Summarization

1. Extractive Text Summarization

2. Abstractive Text Summarization

Recent Notes

Subjects

Publicidad