Programming Language Concepts: Binding, Scope, and Data Structures

Binding, Typing, and Namespaces

A binding is an association between an entity and an attribute, such as between a variable and its type or value, or between an operation and a symbol.

Variable Scope: Static vs. Dynamic

Variable Scope is the range of statements in which the variable is visible. A variable is visible in a statement if it can be referenced or assigned in that statement.

  • Static Scoping: The scope of a variable is determined prior to program execution and remains unchanged throughout (can be located prior to execution).
  • Dynamic Scoping: The scope is determined at run time and can change during execution.

Basic Types and Their Implementation

The type of a variable determines:

  • The range of values the variable can store.
  • The set of operations defined for values of that type.

Types instruct the compiler and the CPU on how to interpret the ones and zeros in memory. The type dictates how much memory to allocate and specifies how data should be interpreted.

Array Implementation Fundamentals

Row-major order and column-major order are methods for storing multidimensional arrays in linear storage, such as Random Access Memory (RAM).

Variable Attributes and Binding Times

Variable Attributes: Name, Address, Value, Type, Lifetime, Scope

  • Name: A string of characters used to identify some entity in a program (note: not all variables have names).
  • Address: The machine address with which the variable is associated. A variable can have different addresses at different places in the program (called l-value, referring to the left side of an assignment).
  • Value: The contents of the memory cell(s) associated with the variable (called r-value, referring to the right side of an assignment).
  • Type: Determines the range of values the variable can store and the set of defined operations.
  • Lifetime: The time during which the variable is bound to a specific memory location.
  • Scope: The range of statements over which the variable is visible.

Aliases, Binding, and Binding Times

  • Aliases: Two or more variables bound to the same storage address. This negatively affects readability but is difficult to eliminate entirely (e.g., created via pointers, reference variables, C/C++ unions).
  • Binding: The association of attributes with program entities (e.g., between a variable and its type/value, or an operation and a symbol).
    • Static Binding: Occurs before run time (e.g., C).
    • Dynamic Binding: Occurs during execution (e.g., JavaScript, Python).
  • The time at which a binding takes place is called Binding Time.
  • Bindings can take place at: Language design time, language implementation time, compile time, load time, link time, or run time.

Naming Conventions and Constraints

  • Name Length: Varies by language (e.g., Fortran 95: 31 characters; C99: no length limitation, first 63 significant; Java, C#, Ada: no length limit, all significant).
  • Naming Styles:
    • Underscore Notation (_): Words separated by underscores (e.g., my_stack). Largely replaced by camel notation.
    • Camel Notation: All words capitalized except the first (e.g., myStack).
  • Case-Sensitivity: Uppercase and lowercase letters are treated as different characters, resulting in distinct names (common in C-based languages like C, C++, Java, C#). Disadvantage: Lowered readability since similar-looking names can have completely different meanings.
  • Reserved Word: A special word of a programming language that is prohibited from being used as a name. Drawback: Too many reserved words can lower writability (e.g., COBOL has 300).

L-value Definition

The L-value is the address of a variable. It is named so because the address is what is required when the name of a variable appears on the left side of an assignment statement.

Note: Addresses can vary (e.g., stack-dynamic variables and redeclarations).

Type Concepts and Memory Cells

  • Example: The int type in Java specifies a value range (-2,147,483,648 to 2,147,483,647) and arithmetic operations.
  • Physical Memory Cells: Typically byte-sized (eight bits). This is too small for most program variables.
  • Abstract Memory Cells: Have the size necessary for the variable with which they are associated. (The term “memory cell” usually refers to the abstract memory cell).

Specific Binding Times

Binding Time is the time when binding takes place. Bindings can occur at:

  • Language Design Time: Binding operator symbols to operations.
  • Language Implementation Time: Binding the floating-point type to a specific representation.
  • Compile-time: Binding a variable to a type in C or Java.
  • Link/Load Time: Binding a C or C++ static variable to a memory cell.
  • Run-time: Binding a non-static local variable to a memory cell.

Static vs. Dynamic Binding

  • Static Binding: Occurs before run time begins and remains unchanged throughout program execution.
  • Dynamic Binding: Occurs at run time and can change during the course of program execution.

Note: The physical binding of a variable to a storage cell is complex, as the page or segment of the address space may be moved in and out of memory many times during execution (abstracted away by the OS).

Understanding Explicit and Implicit Static Type Binding

  • Explicit Declaration: A statement listing variable names and specifying their particular type (e.g., float a = 4.20; int b = (int)a; – casting/explicit type conversion).
  • Implicit Declaration: Uses default conventions to associate variables with types (done by the compiler, interpreter, or language processor, e.g., float a = 4.20;).

Distinguishing Declaration from Definition

  • Declaration: Tells the compiler the name of the variable, its type, and potentially its initial value (e.g., int f;).
  • Definition: Where the variable gets stored (memory is allocated) (e.g., int f(int a) { return a; }).

Example of Dynamic Type Binding

dynamic any; (Illustrates a variable whose type is determined at runtime.)

Storage Binding and Variable Lifetime

Four Categories of Storage Binding

  • Static Variables: Bound to memory cells before execution begins and remain bound until termination.
  • Stack-Dynamic Variables: Storage bindings are created when declaration statements are elaborated (binding process and storage allocation). Types are typically statically bound.
  • Explicit Heap-Dynamic Variables: Nameless/abstract memory cells allocated and deallocated by explicit run-time instructions written by the programmer (e.g., using new/delete).
  • Implicit Heap-Dynamic Variables: Bound to heap storage only when assigned values (common in scripting languages).

History Sensitivity of Static Variables

It is sometimes convenient to have subprograms that are history sensitive. Such a subprogram must have local static variables.

Use Case: History sensitive variables are useful in data manipulation subprograms where an operation is performed on a variable, the function exits, and then the function is called again, retaining the previous state without passing the variable as a parameter.

Relationship of Scope to Stack-Dynamic Lifetime

Static scope is a textual concept, while lifetime is a temporal concept.

Heap-Dynamic Variables Management

  • Advantages of Explicit Heap-Dynamic Variables: Provides a convenient way to build complex data structures.
  • Disadvantages of Explicit Heap-Dynamic Variables: Cost of references to variables, complexity of storage management (manual deallocation risks memory leaks).

Implicit Heap-Dynamic Management and Garbage Collection

Languages like Java have no way of explicitly destroying a heap-dynamic object; they rely on implicit garbage collection to reclaim unused memory.

Type Systems and Compatibility

Static Type Checking, Coercion, and Dynamic Type Checking

  • Static Type Checking: Types are checked before run time (at compile time).
  • Dynamic Type Checking: Types are checked “on the fly,” during execution.
  • Coercion: Automatic conversion of a data type to another (implicit type conversions).

Strong Typing Concepts

A language is strongly typed if type errors are always detected. This requires that the types of all operands can be determined either at compile time or run time.

Java and C# are considered nearly strongly typed, as there are few implicit ways type errors can go undetected.

Type Compatibility and Equivalence (Name vs. Structural)

  • Name Type Equivalence: Two variables have equivalent types if they are defined in the same declaration or declarations that use the same type name.
  • Structure Type Equivalence: Two variables have equivalent types if their types have identical structures.

Scope Rules and Referencing Environments

Static Scope Rules and Name Resolution

Static Scope Rules:

  1. The correct declaration is found by first searching declarations within the current subprogram.
  2. If no declaration is found for the variable, the search continues in the declarations of the subprogram’s static parent (the subprogram that textually declares the current subprogram).
  3. If the variable remains undeclared after searching all static ancestors (parents of parents), an undeclared variable error is reported.

Hiding Possibilities of Scope

A declaration for a variable effectively hides any declaration of a variable with the same name in the larger enclosing scope.

Problems with Static Scope

Static scope can force programs into unusual configurations:

  • It allows more access to both subroutines and variables than is necessary.
  • It encourages the use of far more global variables than necessary.
  • It may not reflect the underlying conceptual design of the program.

Dynamic Scope Rules and Ancestry

(Requires following an example of name resolution given a sequence of calls.)

Referencing Environments

The Referencing Environment is the collection of all variables that are visible in a specific statement (another way of capturing scope).

Data Types: Primitive and User-Defined

Primitive Data Types vs. User-Defined Data Types

  • User-Defined Types: Improve readability and modifiability by allowing meaningful names for types.
  • Primitive Data Types: Data types that are not defined in terms of other types.

Basic Primitive Data Types

  • Integer: Represents whole numbers.
  • Floating-Point: Models real numbers.
  • Decimal: Stores a fixed number of decimal digits with an implied decimal point.
  • Boolean: Represents true/false values.
  • Character: Typically uses ASCII (32-bit Unicode in Fortran).
  • Character Strings: Can be implemented as arrays, built-in types, or objects. Issues include null-termination errors, static or dynamic length, associated bookkeeping, and three styles of storage management for dynamic strings. Regular expressions are used for pattern matching.

Enumeration Types

An enumeration type is a type in which all the possible values, which are named constants, are provided (enumerated) in the definition. Example: enum meats {Beef, Pork, Chicken, Turkey, Veal, etc…}. The enum type is an ordinal value which assigns a number value by order to identifiers in its list.

Design Issues for Enumeration Types (p. 259)
  1. Is an enumeration constant allowed to appear in more than one type definition, and how is that checked?
  2. Are enumeration values coerced to integer?
  3. Are any other types coerced to an enumeration?
Advantages of Enumeration Types (Section 6.4.3)

Enumeration types provide advantages in both readability and reliability:

  1. No arithmetic operations are legal on enumeration types.
  2. No enumeration variable can be assigned a value outside its defined range.

Note: Because C treats enums like integers, it often fails to provide these two reliability advantages.

Subrange Types

What are subrange types? How do they differ from derived types in terms of type checking?

Array Types and Storage Layout

Design Issues Specific to Arrays (p. 263)

  1. What types are legal for subscripts?
  2. Are subscripting expressions in element references range checked?
  3. When are subscript ranges bound?
  4. When does array allocation take place?
  5. Are ragged or rectangular multidimensional arrays allowed, or both?
  6. Can arrays be initialized when their storage is allocated?
  7. What kind of slices are allowed, if any?

Fortran Array Indexing Syntax

Fortran uses the same syntax for array indexing as for function calls because, historically, an array access is viewed as a mathematical function, and at the time, keyboards did not commonly have square brackets [ ].

Subscript Range Checking

Subscript range checking involves verifying that the index used to access an array element falls within the defined bounds. It must be done for safety, but not all languages enforce it.

Categories of Array Storage Binding

  • Static Array:
    • Subscript ranges are statically bound and storage allocation is static (done before run time).
    • Advantage: Efficiency (no dynamic allocation/deallocation).
    • Disadvantage: Storage is fixed for the entire execution time.
  • Fixed Stack-Dynamic Array:
    • Subscript ranges are statically bound, but allocation is done at declaration elaboration time during execution (on the stack).
    • Advantage: Space efficiency.
    • Disadvantage: Requires allocation/deallocation overhead.
  • Stack-Dynamic Array:
    • Binding of subscript ranges and storage allocation are bound at run time.
  • Fixed Heap-Dynamic Array:
    • Subscript ranges and storage bindings are fixed after storage is allocated.
    • Binding and allocation occur when the user program requests them during execution, and storage is allocated from the heap.
    • Advantage: Flexibility.
    • Disadvantage: Allocation time from the heap is typically longer than from the stack.
  • Heap-Dynamic Array:
    • Binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime.
    • Advantage: Maximum flexibility.
    • Disadvantage: Allocation/deallocation takes longer and may happen frequently.

C/C++ Heap Allocation and Deallocation

  • The standard C library functions malloc and free are used for general heap allocation and deallocation.
  • C++ uses the operators new and delete to manage heap storage.
  • An array is treated as a pointer to a collection of homogeneous storage cells, where the pointer can be indexed.

How Arrays are Handled in Java

Non-generic arrays in Java are fixed-heap dynamic. Java allows initialization during allocation, supports jagged arrays, and requires all elements to be of the same type.

Array Initialization Methods (Section 6.5.4)

  • C Integer Array: int list [] = { 4, 5, 6 };
  • C Character Array (String): char phrase [] = “CS320 group is the best!”;
  • C and C++ String Array: char *names [] = {“Bob”, “laptop”, “Django”};
  • Java String Array: String [] names = {“Bob”, “Jake”, “ETC”};

Typical Array Operations (p. 255)

Typical operations include assignments, concatenation, comparison for equality or inequality, and slices.

What is an Array Slice?

A slice is a substructure of an array; it is primarily a referencing mechanism. Slices are only useful in languages that support array operations.

Python Example:

vector = [5, 11, 252, 41, 78, 977] # index starts at 0
vector[2:4]
>> [252, 41]

Address Computation for Two-Dimensional Arrays

Given a two-dimensional array, the address of an element A[i][j] is computed based on storage order (assuming index starts at 0):

  • Row-Major Order: Address(A[i][j]) = BaseAddress + (i * NumCols + j) * ElementSize
  • Column-Major Order: Address(A[i][j]) = BaseAddress + (j * NumRows + i) * ElementSize

Row-Major and Column-Major Order

  • Row-Major Order: Elements are stored row by row. The elements that have the lower bound value of the first subscript are stored first, followed by the elements of the second value of the first subscript, and so forth.

    Example Matrix:

    3  4  7
    6  2  5
    1  3  8

    Stored in memory as: 3, 4, 7, 6, 2, 5, 1, 3, 8

  • Column-Major Order: Elements are stored column by column. The elements that have the lower bound value of the last subscript are stored first, followed by the elements of the second value of the last subscript, and so forth.

    Stored in memory as: 3, 6, 1, 4, 2, 3, 7, 5, 8

Associative Arrays

An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys (user-defined indexes).

Perl Example:

%hi_temps = ("Mon" => 77, "Tue" => 79, "Wed" => 65, …);
$hi_temps{"Wed"} = 83;

Design Issue: The form of references to their elements.