Understanding the Pumping Lemma for Regular Languages

Pumping Lemma

  • The Pumping Lemma states that for any regular language L, there exists a constant ‘M’ (related to the number of states in the accepting finite automaton) such that any word ‘w’ in L with length greater than or equal to ‘M’ can be decomposed into three substrings, x, y, and z (w = xyz), where:
  • The substring ‘y’ is not empty.
  • The length of the concatenation of ‘x’ and ‘y’ is less than or equal to ‘M’.
  • For any non-negative integer ‘n’, the string xynz is also in L. (This means we can ‘pump’ the ‘y’ substring any number of times, and the resulting string will still be in the language).

Why is y not null when pumping 0 times?

Within the string decomposition, ‘y’ cannot be null. Even when pumping 0 times (effectively removing ‘y’), the Pumping Lemma requires ‘y’ to be a non-empty substring. This might seem less obvious initially but is crucial for the lemma’s functionality.

Example: A Regular Language

Let’s illustrate the Pumping Lemma with a regular language accepted by a finite automaton with six states. Consider the word w = bbbababa, which has more than six letters and therefore must correspond to a path in the automaton that includes a circuit.

Breaking Down the Word ‘w’

  • Any word with six or more letters must correspond to a path that includes a circuit.
  • Some words with fewer than six letters also correspond to paths with circuits, such as baaa.
  • In our example, w = bbbababa, the substring bba corresponds to a circuit in the automaton. This is our ‘y’ part.
  • The ‘x’ part is the initial ‘b’.
  • The ‘z’ part is the remaining substring ‘baba’.

Pumping the ‘y’ Substring

Let’s see what happens when we pump the ‘y’ substring (bba) twice. The resulting string is xyyz = bbbabbababa. This new string is still accepted by the automaton, demonstrating the Pumping Lemma.

w = b bba baba

xyyz = b bba bba baba

ch_10_exapnle01_fig_04

Pumping Lemma Version II

The second version of the Pumping Lemma provides a more specific condition on the length of the ‘x’ and ‘y’ substrings. It states:

Let L be an infinite language accepted by a finite automaton with ‘N’ states. Then for all words ‘w’ in L that have more than ‘N’ letters, there are strings ‘x’, ‘y’, and ‘z’ where:

  • ‘y’ is not null.
  • length(x) + length(y) does not exceed ‘N’.
  • w = xyz
  • All strings of the form xynz (for n = 1, 2, 3, …) are in L.

Why is Pumping Lemma Version II Needed?

The second version is necessary because the first version might not be sufficient to prove certain languages are non-regular. For example, consider the language of palindromes (PALINDROME).

Example: PALINDROME

Using the first version of the Pumping Lemma, we could choose x = a, y = b, and z = a for the word ‘aba’. This decomposition satisfies the conditions of the first version, as all strings of the form abna are palindromes. However, this doesn’t contradict the regularity of PALINDROME.

Using the second version, let’s assume a finite automaton with 77 states accepts PALINDROME. Consider the palindrome w = a80ba80. Since ‘w’ has more letters than the automaton has states, we can break it into x, y, and z according to the second version. However, since length(x) + length(y) cannot exceed 77, both ‘x’ and ‘y’ must consist only of ‘a’s. Pumping ‘y’ will result in a string with more ‘a’s at the beginning than at the end, which is not a palindrome. This contradicts the second version of the Pumping Lemma, proving that PALINDROME is non-regular.

Pumping Lemma Version II Example: PRIME

Let’s consider the language PRIME = {ap where p is a prime number} = {aa, aaa, aaaaa, aaaaaaa, …}.

Is PRIME a regular language? Let’s assume it is and that a finite automaton with 345 states accepts it. Choose a prime number larger than 345, for example, 347. Then a347 can be broken into x, y, and z according to the Pumping Lemma. Let’s pump ‘y’ 348 times, resulting in the string xy348z = a347y347. Since ‘y’ is a non-empty string of ‘a’s, let’s say y = am for some integer ‘m’. Then a347y347 = a347(am)347 = a347 + 347m = a347(m+1). Since ‘m’ is not zero, 347(m+1) is not a prime number. This contradicts the definition of PRIME, as all strings in PRIME should have a prime number of ‘a’s. Therefore, PRIME is non-regular.

Conclusion

The Pumping Lemma, especially its second version, is a powerful tool for proving that certain languages are not regular. It highlights the limitations of finite automata and provides insights into the fundamental properties of regular languages.

Key Takeaway

The Pumping Lemma helps us understand the limitations of what can be recognized by finite automata and provides a valuable tool for classifying languages within the Chomsky Hierarchy.