Programming Language Design: Principles, Syntax, and BNF

Posted on Aug 22, 2025 in Computers

Programming Language Design Principles

Orthogonality

Orthogonality in programming languages is a principle aimed at providing maximum generality, ensuring there are no restrictions or special cases that combine various language elements. Orthogonality exists when there are no invalid combinations, meaning a programmer should clearly understand if exceptions to a rule exist.

Example of Non-Orthogonality

A common example of a lack of orthogonality is when parameters are automatically passed by value, but arrays are automatically passed by reference. This creates an inconsistent rule set.

Advantages of Orthogonal Design

Ease of Learning: An orthogonal language is generally easier to learn.
Simpler Writing: It allows for simpler code writing due to the absence of exceptions or special cases to remember.

Disadvantages of Orthogonality

While promoting error-free compilation, a highly orthogonal language might offer limitations, especially in the handling of complex data structures or specific programming paradigms.

Syntactic Clarity

Syntactic Clarity dictates that an instruction or operation should not have multiple, ambiguous writing rules within the same language. This ensures consistency and reduces programmer confusion.

Language Semantics

The Semantics of a Programming Language refers to the meaning attached to every instruction, operation, and so forth. It defines what a program does when executed.

Language Orientation

Language Orientation involves providing a syntax that aligns with the language’s intended purpose or historical context (e.g., its commitment to specific programming paradigms or user communities).

Extensibility

Extensibility facilitates the implementation of existing structures based on the functions a language provides. It allows users to define their own data types (EDTs) or code their own operators, enhancing flexibility.

Portability

Portability is the crucial ability to use a software program on different computer systems or environments without significant modifications.

Efficiency Considerations

Efficiency in Translation

Efficiency in Translation is achieved through fast compilation, often a key characteristic of education-oriented programming languages.

Efficiency in Execution

Efficiency in Execution is realized when programming language-oriented routines are widely used and highly optimized, leading to faster program performance.

Efficiency in Construction

Efficiency in Construction refers to programming languages that enable quick and clear program development, often providing robust assistance in diagnosis and debugging.

Programming Language Syntax

Defining Syntax

Syntax is the set of rules that determine whether the statements of a program are well-formed. Its primary goal is to provide a notation that enables clear communication between the programmer and the compiler.

The Role of a Compiler

A Compiler is a mechanism that determines whether a program adheres to the language’s defined syntax rules, translating it into machine-executable code if valid.

Key Syntactic Criteria

Effective programming language design considers several syntactic criteria:

Readability: Programming language instructions should be easily interpreted and understood by the programmer.
Ease of Writing: The language should feature simple, straightforward elements, avoiding overly complex or far-fetched constructs.
Translation Facility: The compiler should be able to generate efficient and minimal machine code from the source.
Lack of Ambiguity: A structure or instruction should ideally have only one clear meaning to prevent misinterpretation.

Fundamental Syntactic Elements

The building blocks of a programming language’s syntax include:

Set of Characters
Identifiers
Operator Symbols
Reserved Keywords
Comments
Abbreviations
Delimiters
Phrases
Statements

Identifiers

Identifiers are fundamental components in programming language writing. They typically have a maximum number of symbols defined by the language version. More readable versions often allow longer identifiers with various characters, enhancing code clarity.

Keywords

A Keyword is an identifier that forms an unchanging, predefined part of an instruction.

Reserved Keywords

A Reserved Keyword is a special type of keyword that cannot be used as a user-defined identifier.

Advantages: Simplifies syntactic analysis and improves error detection during compilation.

Comments

Comments provide essential self-documentation within the source code, explaining logic and intent without affecting program execution.

Abbreviations

Abbreviations are used within a language to enhance legibility and often to shorten common constructs.

Delimiters

Delimiters are special symbols or sequences used to mark the beginning or end of a syntactic unit.

Advantages: Improves readability, simplifies parsing for compilers, and helps remove ambiguity in language constructs.

Code Formatting: Fixed vs. Free

Fixed Format

In a Fixed Format language, instructions must be placed in a specific part of the line or column, often requiring precise indentation or alignment.

Free Format

In a Free Format language, instructions can be written without concern for their position or length on the line, offering greater flexibility to the programmer.

Expressions

Expressions are combinations of identifiers, operators, and variables that evaluate to a value.

Imperative Expressions

Imperative Expressions form the basic operations that allow instructions to change the state of variables within a program.

Functional Expressions

Functional Expressions form the basic control flow sequence that manages program execution, often by evaluating functions and returning results without side effects.

Statements

Simple Statements

Simple Statements are single, atomic instructions that do not allow nested statements within them.

Structured Statements

Structured Statements are control flow constructs (like loops or conditionals) that allow nested statements, enabling complex program logic.

Grammar and Formal Language Description

Understanding Grammar

Grammar is the formal definition of a programming language’s syntax. It consists of a set of rules that precisely specify the writing conventions and valid constructs of the language.

Metagrammars

A Metagrammar is a formal grammar used for describing other languages. Examples include Backus-Naur Form (BNF), syntax diagrams, and CBL (Common Base Language).

Backus-Naur Form (BNF)

Developed by Backus and Naur (initially for ALGOL), Backus-Naur Form (BNF) is used to express context-free grammars, providing a formal way to describe formal languages. It serves as a notation for grammars of programming languages, operating systems, and communication protocols.

BNF Metasymbols Explained

In BNF, specific metasymbols are used to define grammar rules:

< >: Denotes a non-terminal symbol (a syntactic category that can be further broken down).
::=: Means “is defined as” or “produces”.
|: Represents “or“, indicating alternative definitions.
( ): Can be used for grouping, or in extended BNF, for repetition (e.g., minimum ‘n’ times).
An identifier can be a keyword or a character constant terminal.

Note: The symbols < >, |, ( ), ::= are not part of the language being defined but are part of the description mechanism, known as metasymbols.

BNF Example: ‘for’ Loop Structure

Here’s an example illustrating the BNF definition for a ‘for’ loop structure:

<for_statement> ::= FOR ( <initialization> ; <condition> ; <increment> ) <statement>
<initialization> ::= <expression> | <expression> , <expression>
<condition> ::= <expression> | <expression> , <expression>
<increment> ::= <expression> | <expression> , <expression>
<statement> ::= <simple_statement> | <compound_statement>
<simple_statement> ::= <expression> [ ; <expression> ]
<compound_statement> ::= { <simple_statement> [ ; <simple_statement> ]... }
<expression> ::= <assignment_expression> | <increment_expression> | <conditional_expression>

Programming Language Design: Principles, Syntax, and BNF