Programming Language Data Types Explained

Introduction to Data Types

A data type defines a collection of values and operations on those values. A descriptor is a set of attributes of a variable. An object is an instance of an abstract data type (user-defined or built-in). A key design issue for all data types is: What operations are provided for variables, and how are they specified?

Primitive Data Types Explained

  • Primitive types are not defined in terms of other types.
  • They often mirror hardware data types for performance and compatibility.

Integer Types

  • Common primitive numeric type.
  • Maps directly to hardware.
  • Example sizes in Java: byte, short, int, long.

Floating Point Types

  • Approximations of real numbers.
  • Common types: float, double.
  • IEEE 754 standard format is widely used.

Complex Number Types

  • Supported by C99, Fortran, Python.
  • Represented as two floats: (real + imag*j) e.g., (7 + 3j).

Decimal Types

  • Used in business applications.
  • Stores a fixed number of decimal digits using Binary Coded Decimal (BCD).
  • Precise for money; limited range; memory inefficient.
  • Supported by COBOL, C#, F#.

Boolean Types

  • Two values: true or false.
  • Stored as a byte or bit (bit usually inaccessible).

Character Types

  • Stored as a numeric code (ASCII, Unicode).
  • Unicode UCS-2 (16-bit) and UCS-4 (32-bit) for global language support.
  • Supported by Java, JavaScript, Python, C#, C++.

Character String Types

A string is a sequence of characters that may or may not be a primitive type.

Key Design Issues for Strings

  • Should strings be a primitive type or a special kind of array?
  • Should the length be static, limited dynamic, or dynamic?

String Length Types

  • Static Length: Fixed at declaration (e.g., Java char[], COBOL).
  • Limited Dynamic Length: Allows length change up to a limit (e.g., C char[]).
  • Dynamic Length: Can change at runtime (e.g., JavaScript, Python, Perl).

Typical String Operations

  • Assignment, comparison, concatenation, substring reference, pattern matching.

String Support in Languages

  • C and C++:
    • Strings are not primitive types.
    • C uses null-terminated char arrays; standard operations from <string.h>.
    • C++ supports both C-style strings and the std::string class.
  • Python:
    • Strings are primitive, immutable types.
    • Built-in operators for concatenation (+), repetition (*), slicing, and pattern matching.
  • Java:
    • Strings are objects of the String class (immutable).
    • StringBuffer and StringBuilder are mutable alternatives.
    • Methods: .substring(), .indexOf(), .compareTo(), .matches() (regex).
  • C# and Ruby:
    • Include string classes with rich libraries, similar to Java.
    • String operations include insertion, deletion, regex, slicing.

Many languages provide strong pattern matching support:

  • C++ (via regex libraries)
  • Java (java.util.regex)
  • Python (re module)
  • JavaScript (RegExp)
  • C# (.NET Regex)
  • Ruby (built-in =~, //)

String Implementation Details

  • Static strings: Simple descriptors (pointer + fixed length).
  • Dynamic strings: Descriptors must support reallocation and track current size.
  • Languages may use reference counting or garbage collection to manage string memory.

Evaluating String Types

  • Primitive strings simplify compiler design and allow optimizations.
  • Dynamic strings enhance flexibility but increase runtime overhead and complexity.

Implementing String Types

Static Length Strings

  • Require a compile-time descriptor.
  • Descriptor stores the maximum size and memory address.

Limited Dynamic Strings

  • Require a run-time descriptor.
  • Must track current length, maximum allowed length, and memory location.
  • Examples: C-style strings with buffer limits.

Dynamic Length Strings

  • Require a simpler run-time descriptor than limited dynamic strings.
  • Descriptor typically stores just current length and pointer.
  • Require complex storage management to support reallocation at runtime.
  • Garbage collection or manual memory management is essential to avoid leaks.
  • String representation may vary across compilers and runtimes.
  • Tradeoffs involve speed (static is fastest) vs flexibility (dynamic allows resizing).

Enumeration Types

All possible values are listed as named constants. Example: enum colors {RED, GREEN, BLUE};

Design Questions for Enums

  • Can enums be reused across types?
  • Should coercion to/from integers be allowed?

Advantages of Enumerations

  • Improves program readability and reliability.
  • Prevents illegal values from being assigned.

C++ allows arithmetic ops on enums; C# and Java do not.

Evaluating Enumerated Types

  • Improves program readability: Developers don’t need to remember or assign numeric codes to meaningful values (e.g., colors, days).
  • Improves program reliability:
    • Compiler checks ensure only valid enum values are used.
    • Disallows arithmetic operations on enum types (e.g., cannot add two days together).
    • Prevents assigning values outside of defined range.

Enum Support in Languages

  • C++ allows enum variables to be treated as integers (coercion).
  • C#, F#, Swift, and Java 5.0+ provide stronger type-checking:
    • Enum types are not implicitly coerced into integers.
    • More reliable enforcement of enum constraints at compile-time.

These features help prevent bugs and improve code correctness.

Array Types

An array is a homogeneous aggregate of elements identified by position relative to the first. All array elements must be of the same type.

Array Design Issues

  • What types are legal for subscripts?
  • Are subscript expressions range checked?
  • When are subscript ranges bound?
  • When does array allocation take place?
  • Are ragged or rectangular multidimensional arrays allowed?
  • Can arrays be initialized at the time of storage allocation?
  • What kinds of slices are supported?

Array Indexing

  • Indexing is a mapping from indices to elements.
  • Syntax: Fortran and Ada use parentheses (); most others use brackets [].
  • Ada uses parentheses to emphasize that arrays and function calls are both mappings.

Index Types and Range Checking

  • Most common index types: integers.
  • FORTRAN, C, Java: integer types only.
  • Range checking improves reliability.
  • Not specified: C, C++, Perl, Fortran.
  • Enforced: Java, ML, C#.

Array Categories by Binding

  1. Static Arrays:
    • Subscript ranges and storage are fixed before runtime.
    • Pros: very efficient.
    • Cons: memory cannot be reused.
  2. Fixed Stack-Dynamic Arrays:
    • Subscript ranges bound before runtime, allocation at declaration.
    • Pros: space-efficient.
    • Cons: requires runtime management.
  3. Fixed Heap-Dynamic Arrays:
    • Subscripts and storage bound at user request, allocated from heap.
    • Pros: fits exact size requirements.
    • Cons: slower heap allocation.
  4. Heap-Dynamic Arrays:
    • Fully dynamic subscripts and allocation, may change multiple times.
    • Pros: most flexible.
    • Cons: higher overhead.

Array Examples in Languages

  • C/C++:
    • With ‘static’: static arrays.
    • Without ‘static’: fixed stack-dynamic.
  • Java and C#: fixed heap-dynamic arrays.
  • Perl, Python, Ruby, JS: heap-dynamic arrays.

Array Initialization

C, C++, Java, Swift, and C# support inline initialization.

  • Examples:
    • int list[] = {4, 5, 7, 83};
    • char name[] = "Freddie";
    • String[] names = {"Bob", "Jake", "Darcie"};

Array Operations

  • Typical operations: assignment, concatenation, comparison, slicing.
  • C-style: no built-in operations, handled via libraries.
  • APL: most extensive built-in array/vector/matrix ops.
  • Python: lists (dynamic arrays) support extensive operations.
  • Ruby: supports array concatenation.

Implementing Arrays

  • Requires more compile-time logic than primitives.
  • Element access logic must be generated at compile time and evaluated at runtime.

Accessing 1D Arrays

  • If lower bound is 0: address(list[k]) = base + k * element_size
  • General case: address(list[k]) = base + (k - lb) * element_size
  • Compile-time descriptor contains: base address, size, bounds.

Accessing Multi-Dimensional Arrays

  • Represented in linear memory.
  • Row-major order: C, Java.
  • Column-major order: Fortran.
  • Address mapping: location(a[i,j]) = base + (((i - row_lb) * n) + (j - col_lb)) * element_size where n = number of elements per row.

Associative Arrays

An associative array is an unordered collection indexed by user-defined keys. Each element is a key-value pair.

Associative Array Design Issues

  • What is the form of references to elements?
  • Is the size static or dynamic?

Associative Array Support

  • Built-in support: Perl, Python, Ruby, Swift.
  • Standard libraries: Java (HashMap<k,v>), C++, C#, F#.
  • Python and Swift call them dictionaries.

Associative Arrays in Perl

  • Called hashes; use hash functions internally.
  • Variable names begin with %.
  • Keys: strings; Values: scalars (numbers, strings, references).
  • Literal assignment example: %salaries = ("Gary" => 75000, "Perry" => 57000, "Mary" => 55750, "Cedric" => 47850);
  • Access value by key: $salaries{"Perry"} = 58850;
  • Remove an entry: delete $salaries{"Gary"};

List Types

Lists are fundamental in functional languages (Lisp, Scheme, ML).

  • Lisp uses parentheses: '(A B C)

List Operations

  • CAR: returns first element
  • CDR: returns tail
  • CONS: adds new head

List Support in Languages

  • ML & F#: [1,2,3]; cons using '::'
  • Python: lists are mutable and heterogeneous
    • List comprehension: [x*x for x in range(10) if x%2 == 0]
  • Haskell: [x*x | x <- [1..10]]
  • Java/C#: List, ArrayList