Natural Language Processing Fundamentals and Applications

Posted on Jan 22, 2026 in English Studies. Language, Literature, and Culture

Understanding Ambiguity in NLP

Ambiguity occurs when a word, phrase, or sentence has more than one possible meaning. It is present at all levels of NLP (lexical, syntactic, semantic, discourse, and pragmatic).

Example 1: “The chicken is ready to eat” – chicken (food) or chicken (bird).
Example 2: “The man saw the girl with the telescope” – who has the telescope?

Types of Ambiguity

Lexical Ambiguity – A word having multiple meanings (e.g., bat, bank).
Syntactic (Structural) Ambiguity – Multiple possible parse structures (attachment and scope ambiguity).
Semantic Ambiguity – Even after syntax is resolved, sentences have multiple interpretations.
Discourse Ambiguity – Ambiguity in pronoun/reference resolution (“The horse ran up the hill. It was steep.”).
Pragmatic Ambiguity – Meaning depends on context, intention, beliefs, and real-world knowledge.

Ambiguity is one of the major challenges in NLP and needs techniques like POS-tagging, probabilistic models, and Word Sense Disambiguation (WSD) to resolve.

Text Classification and Categorization

Text Classification (also called text categorization) is the task of automatically assigning documents to predefined categories.

Examples of Text Classification

Spam vs. non-spam email
News categorization (sports, politics, entertainment)
Sentiment classification (positive/negative)

Uses of Text Classification

Filtering content
Spam filtering
Survey coding
Topic spotting
Document identification

It is commonly used in information retrieval, sentiment analysis, and many NLP applications.

Finite State Transducers (FST)

A Finite State Transducer (FST) is an extension of a finite automaton that maps between two levels of representation. It processes input strings and produces corresponding output strings.

Key Features of FST

Used in morphological parsing, stemming, phonological rules, etc.
Consists of states, transitions, input symbols, and output symbols.
Each transition consumes one input symbol and produces one output symbol.

Examples of FST Applications

Converting cats → cat + plural
Mapping base form to inflected form or vice-versa
Lexicon construction and Porter stemming implementation

FSTs are powerful tools in NLP for analyzing and generating word forms.

Stemming Techniques in Text Processing

Stemming is the process of reducing a word to its root form by removing prefixes or suffixes without considering the context. It is an approximation method and may produce non-dictionary words.

Examples of Stemming

studies → studi
writing → writ

Characteristics of Stemming

Fast and simple
Accuracy is lower than lemmatization
Works well when exact meaning is not important (e.g., spam detection)

Common Algorithm: The Porter Stemmer uses a series of rules to strip suffixes.

Key Applications of Natural Language Processing

NLP has a wide range of applications as listed in the study material:

Machine Translation – e.g., Google Translate
Information Retrieval – search engines like Google, Yahoo
Text Categorization – spam filtering, content filtering
Information Extraction – extracting structured data from unstructured text
Grammar Checking – spelling and grammar correction (e.g., MS Word)
Sentiment Analysis – detecting emotions/opinions in text
Question Answering Systems – systems that answer natural language questions
Spam Detection – identifying unwanted emails
Chatbots – customer service bots
Speech Recognition – converting speech to text
Text Summarization – generating summaries of long documents

These applications demonstrate the importance of NLP in real-world systems.

N-Gram Statistical Language Models

An N-Gram model is a statistical language model that predicts the next word in a sequence based on the previous N–1 words. It is used to estimate the probability of word sequences.

Key Points of N-Grams

An N-gram is a contiguous sequence of n items (words or characters).
Unigram (1-gram): P(w₁)
Bigram (2-gram): P(w₂ | w₁)
Trigram (3-gram): P(w₃ | w₁ w₂)
Used in language modeling, spelling correction, text generation, speech recognition, etc.

Purpose: To compute probabilities of sentences and help predict the most likely word sequence.

Prepositional Phrases and Syntactic Structure

A Prepositional Phrase (PP) is a syntactic structure consisting of a preposition (P) followed by a noun phrase (NP).

Structure: PP = P + D + N (Preposition + Determiner + Noun).
Example: “on the table”, “in the city”.

Role in Sentences

Adds additional information (place, time, manner).
Attaches either to the noun phrase or verb phrase, often causing attachment ambiguity.

Example: “The man saw the girl with the telescope.” – “with the telescope” can attach to man or girl. Prepositional phrases play a major role in syntactic analysis and semantic interpretation.

Text Summarization Methods

Text summarization is the process of producing a short and meaningful summary of a longer document while preserving its core content.

Types of Summarization

Extractive Summarization – selecting important sentences from the text.
Abstractive Summarization – generating new sentences based on an understanding of the content.

Methods and Algorithms

LEXRANK – a graph-based algorithm for extractive summarization.
Optimization-based approaches – optimize objective functions to produce concise summaries.

Applications of Summarization

News summarization
Research paper summaries
Search engine snippets
Email or report reduction

The goal is to reduce reading time while keeping the essential meaning intact.

Sentiment Analysis and Opinion Mining

Sentiment Analysis, also called Opinion Mining, is the process of identifying and classifying the emotional tone expressed in a text. It determines whether the sentiment is positive, negative, or neutral.

Key Points of Sentiment Analysis

Used to analyze behavior, attitude, and the emotional state of the user.
Implemented using a combination of NLP and statistics.
Works by assigning polarity values (positive/negative/neutral) to words, phrases, or sentences.
Uses affective lexicons, machine learning models, or hybrid approaches.

Applications of Sentiment Analysis

Product review analysis
Social media monitoring
Customer feedback processing
Market analysis

Sentiment Analysis helps machines understand human emotions within text.

Porter’s Stemming Algorithm

Porter’s Stemmer is the most widely used stemming algorithm in English. It removes common suffixes from words to convert them into their root or stem form.

Key Points of Porter’s Stemmer

Works using a sequence of five phases of rule-based suffix stripping.
Each phase applies a set of conditions to remove or replace endings like -ing, -ed, -ly, -ation, etc.
Produces stems that may not always be real dictionary words (because stemming is approximate).
Example: “studies” → “studi”, “writing” → “writ”.
Used where exact word meaning is not needed, such as information retrieval, spam detection, and search engines.

The Porter Stemmer is efficient, simple, and helps reduce morphological variations of words.