NLP Fundamentals: Morphology, Semantics, and Parsing

Posted on Mar 9, 2026 in English Studies. Language, Literature, and Culture

Word Structure and Components in NLP

In linguistics and Natural Language Processing (NLP), a structured word (or word structure) refers to how a word is internally organized using meaningful building blocks. Words are not always indivisible; many are formed by combining smaller units called morphemes, which are the smallest units of meaning.

Components of Word Structure

Root / Base: The core element carrying the primary meaning. Example: play in replay, player, and playful.
Stem: The form to which affixes attach. It may be just the root or a root plus a derivational affix. Example: play is a stem; player is the stem for players.
Affixes: Bound morphemes added to modify meaning or grammar.
- Prefix (before stem): unhappy, replay
- Suffix (after stem): happiness, played
- Infix (inside stem): Rare in English.
Derivational Morphemes: Create new words or change the word class. Example: happy (adjective) → happiness (noun).
Inflectional Morphemes: Modify grammatical features without changing the core meaning or class. Example: play → played → playing.

Example Breakdown: Unhappiness

un- (prefix, negation)
happy (root)
-ness (suffix, noun formation)

Importance in NLP

Understanding word structure is essential for:

Stemming and lemmatization
Morphological analysis
Machine translation

Morphological Models and Illustrations

Dictionary Lookup Model

The Dictionary Lookup Model stores all valid word forms in a lexicon (dictionary). Morphological analysis is performed by directly matching the input word with entries in the dictionary.

How it Works

Input word is received.
System searches the dictionary.
If found, it retrieves features (lemma, POS, tense, etc.).
If not found, it is marked as unknown or an error.

Illustration

Input: went
Lookup Result: Lemma → go; POS → Verb; Tense → Past

Input: cats
Lookup Result: Lemma → cat; POS → Noun; Number → Plural

Pros and Cons

Advantages: Simple, fast, and accurate for known words.
Limitations: Requires large storage, cannot handle unseen words, and lacks generalization.

Unification-Based Morphology

This model represents words using feature structures (attribute–value pairs). Morphological rules combine stems and affixes through unification (feature matching).

Core Idea

Words are built by merging compatible features.

Illustration

Lexical Entry (Stem): play (Category: Verb, Tense: Base)
Suffix Rule (-ed): Requires: Verb, Adds: Tense = Past
Unification Process: play + ed → played (Verb, Past)

Stem: cat (Category: Noun, Number: Singular)
Suffix (-s): Requires: Noun, Adds: Number = Plural
Result: cats (Noun, Plural)

Pros and Cons

Advantages: Handles unseen forms and offers compact representation.
Limitations: Computationally complex and requires detailed grammar rules.

Morphology in Language Modeling and Semantics

How Morphological Structure Helps Language Modeling

Morphological structure improves language models by capturing internal word patterns rather than treating words as isolated tokens.

Reduces Data Sparsity: Different word forms share the same root (e.g., play, plays, played).
Better Generalization: Models can understand unseen forms (e.g., inferring walked from walk).
Improved Probability Estimation: Models learn shared features instead of independent probabilities.
Handles Morphologically Rich Languages: Essential for languages like Hindi, Turkish, and Finnish.
Efficient Vocabulary Usage: Breaking words into morphemes results in a smaller vocabulary.

Handling Semantics in NLP

Semantics deals with meaning interpretation through various techniques:

Lexical Semantics: Understanding word relations like synonymy, antonymy, and polysemy (e.g., WordNet).
Distributional Semantics: Meaning from context using word embeddings (e.g., king – man + woman ≈ queen).
Compositional Semantics: Meaning of phrases from their parts (e.g., “red apple”).
Semantic Role Labeling (SRL): Identifies “who did what to whom” (Agent, Action, Object).
Named Entity Recognition (NER): Identifies real-world entities like people or locations.
Contextual Models: Modern transformers (BERT, GPT) capture context-dependent meaning.

Multilingual Tokenization and Parsing Challenges

Tokenization and parsing are complex in multilingual content due to differences in script, grammar, and morphology.

Tokenization Challenges

Script Diversity: Different writing systems (Latin, Devanagari, Arabic).
Word Boundary Ambiguity: Languages like Chinese (我喜欢学习) lack spaces.
Agglutinative Languages: Long complex words formed by many morphemes (e.g., Turkish).
Clitics and Contractions: Examples like l’amour or don’t.
Code-Switching: Mixing languages (e.g., “Kal meeting hai”).
Entities and Emojis: Ensuring @OpenAI or emojis are not split incorrectly.

Parsing Challenges

Grammatical Variations: Different word orders like SVO (English) vs. SOV (Hindi).
Rich Morphology: Case, gender, and tense encoded in suffixes.
Structural Ambiguity: One sentence resulting in multiple parse trees.
Resource Scarcity: Lack of annotated corpora for many languages.
Idioms: Literal parsing fails for expressions like “kick the bucket.”

Predicate-Argument Structure Examples

A predicate expresses an action or state, while arguments are the participating entities.

Example 1: Ram ate a mango.
Predicate: ate; Arguments: Agent (Ram), Theme (mango).
Structure: eat (Ram, mango)
Example 2: She gave him a book.
Predicate: gave; Arguments: Agent (She), Recipient (him), Theme (book).
Structure: give (She, him, book)
Example 3: The boy kicked the ball.
Predicate: kicked; Arguments: Agent (boy), Theme (ball).
Structure: kick (boy, ball)
Example 4 (State): The sky is blue.
Predicate: is; Arguments: Theme (sky), Attribute (blue).
Structure: be (sky, blue)
Example 5 (Location): They live in Delhi.
Predicate: live; Arguments: Agent (They), Location (Delhi).
Structure: live (They, Delhi)

NLP Fundamentals: Morphology, Semantics, and Parsing

Word Structure and Components in NLP

Components of Word Structure

Example Breakdown: Unhappiness

Importance in NLP

Morphological Models and Illustrations

Dictionary Lookup Model

How it Works

Illustration

Pros and Cons

Unification-Based Morphology

Core Idea

Illustration

Pros and Cons

Morphology in Language Modeling and Semantics

How Morphological Structure Helps Language Modeling

Handling Semantics in NLP

Multilingual Tokenization and Parsing Challenges

Tokenization Challenges

Parsing Challenges

Predicate-Argument Structure Examples

Recent Notes

Subjects

Publicidad