Programming Language Data Types Explained

Posted on May 6, 2025 in Computers

Introduction to Data Types

A data type defines a collection of values and operations on those values. A descriptor is a set of attributes of a variable. An object is an instance of an abstract data type (user-defined or built-in). A key design issue for all data types is: What operations are provided for variables, and how are they specified?

Primitive Data Types Explained

Primitive types are not defined in terms of other types.
They often mirror hardware data types for performance and compatibility.

Integer Types

Common primitive numeric type.
Maps directly to hardware.
Example sizes in Java: byte, short, int, long.

Floating Point Types

Approximations of real numbers.
Common types: float, double.
IEEE 754 standard format is widely used.

Complex Number Types

Supported by C99, Fortran, Python.
Represented as two floats: (real + imag*j) e.g., (7 + 3j).

Decimal Types

Used in business applications.
Stores a fixed number of decimal digits using Binary Coded Decimal (BCD).
Precise for money; limited range; memory inefficient.
Supported by COBOL, C#, F#.

Boolean Types

Two values: true or false.
Stored as a byte or bit (bit usually inaccessible).

Character Types

Stored as a numeric code (ASCII, Unicode).
Unicode UCS-2 (16-bit) and UCS-4 (32-bit) for global language support.
Supported by Java, JavaScript, Python, C#, C++.

Character String Types

A string is a sequence of characters that may or may not be a primitive type.

Key Design Issues for Strings

Should strings be a primitive type or a special kind of array?
Should the length be static, limited dynamic, or dynamic?

String Length Types

Static Length: Fixed at declaration (e.g., Java char[], COBOL).
Limited Dynamic Length: Allows length change up to a limit (e.g., C char[]).
Dynamic Length: Can change at runtime (e.g., JavaScript, Python, Perl).

Typical String Operations

Assignment, comparison, concatenation, substring reference, pattern matching.

String Support in Languages

C and C++:
- Strings are not primitive types.
- C uses null-terminated char arrays; standard operations from <string.h>.
- C++ supports both C-style strings and the std::string class.
Python:
- Strings are primitive, immutable types.
- Built-in operators for concatenation (+), repetition (*), slicing, and pattern matching.
Java:
- Strings are objects of the String class (immutable).
- StringBuffer and StringBuilder are mutable alternatives.
- Methods: .substring(), .indexOf(), .compareTo(), .matches() (regex).
C# and Ruby:
- Include string classes with rich libraries, similar to Java.
- String operations include insertion, deletion, regex, slicing.

Many languages provide strong pattern matching support:

C++ (via regex libraries)
Java (java.util.regex)
Python (re module)
JavaScript (RegExp)
C# (.NET Regex)
Ruby (built-in =~, //)

String Implementation Details

Static strings: Simple descriptors (pointer + fixed length).
Dynamic strings: Descriptors must support reallocation and track current size.
Languages may use reference counting or garbage collection to manage string memory.

Evaluating String Types

Primitive strings simplify compiler design and allow optimizations.
Dynamic strings enhance flexibility but increase runtime overhead and complexity.

Implementing String Types

Static Length Strings

Require a compile-time descriptor.
Descriptor stores the maximum size and memory address.

Limited Dynamic Strings

Require a run-time descriptor.
Must track current length, maximum allowed length, and memory location.
Examples: C-style strings with buffer limits.

Dynamic Length Strings

Require a simpler run-time descriptor than limited dynamic strings.
Descriptor typically stores just current length and pointer.
Require complex storage management to support reallocation at runtime.
Garbage collection or manual memory management is essential to avoid leaks.
String representation may vary across compilers and runtimes.
Tradeoffs involve speed (static is fastest) vs flexibility (dynamic allows resizing).

Enumeration Types

All possible values are listed as named constants. Example: enum colors {RED, GREEN, BLUE};

Design Questions for Enums

Can enums be reused across types?
Should coercion to/from integers be allowed?

Advantages of Enumerations

Improves program readability and reliability.
Prevents illegal values from being assigned.

C++ allows arithmetic ops on enums; C# and Java do not.

Evaluating Enumerated Types

Improves program readability: Developers don’t need to remember or assign numeric codes to meaningful values (e.g., colors, days).
Improves program reliability:
- Compiler checks ensure only valid enum values are used.
- Disallows arithmetic operations on enum types (e.g., cannot add two days together).
- Prevents assigning values outside of defined range.

Enum Support in Languages

C++ allows enum variables to be treated as integers (coercion).
C#, F#, Swift, and Java 5.0+ provide stronger type-checking:
- Enum types are not implicitly coerced into integers.
- More reliable enforcement of enum constraints at compile-time.

These features help prevent bugs and improve code correctness.

Array Types

An array is a homogeneous aggregate of elements identified by position relative to the first. All array elements must be of the same type.

Array Design Issues

What types are legal for subscripts?
Are subscript expressions range checked?
When are subscript ranges bound?
When does array allocation take place?
Are ragged or rectangular multidimensional arrays allowed?
Can arrays be initialized at the time of storage allocation?
What kinds of slices are supported?

Array Indexing

Indexing is a mapping from indices to elements.
Syntax: Fortran and Ada use parentheses (); most others use brackets [].
Ada uses parentheses to emphasize that arrays and function calls are both mappings.

Index Types and Range Checking

Most common index types: integers.
FORTRAN, C, Java: integer types only.
Range checking improves reliability.
Not specified: C, C++, Perl, Fortran.
Enforced: Java, ML, C#.

Array Categories by Binding

Static Arrays:
- Subscript ranges and storage are fixed before runtime.
- Pros: very efficient.
- Cons: memory cannot be reused.
Fixed Stack-Dynamic Arrays:
- Subscript ranges bound before runtime, allocation at declaration.
- Pros: space-efficient.
- Cons: requires runtime management.
Fixed Heap-Dynamic Arrays:
- Subscripts and storage bound at user request, allocated from heap.
- Pros: fits exact size requirements.
- Cons: slower heap allocation.
Heap-Dynamic Arrays:
- Fully dynamic subscripts and allocation, may change multiple times.
- Pros: most flexible.
- Cons: higher overhead.

Array Examples in Languages

C/C++:
- With ‘static’: static arrays.
- Without ‘static’: fixed stack-dynamic.
Java and C#: fixed heap-dynamic arrays.
Perl, Python, Ruby, JS: heap-dynamic arrays.

Array Initialization

C, C++, Java, Swift, and C# support inline initialization.

Examples:
- int list[] = {4, 5, 7, 83};
- char name[] = "Freddie";
- String[] names = {"Bob", "Jake", "Darcie"};

Array Operations

Typical operations: assignment, concatenation, comparison, slicing.
C-style: no built-in operations, handled via libraries.
APL: most extensive built-in array/vector/matrix ops.
Python: lists (dynamic arrays) support extensive operations.
Ruby: supports array concatenation.

Implementing Arrays

Requires more compile-time logic than primitives.
Element access logic must be generated at compile time and evaluated at runtime.

Accessing 1D Arrays

If lower bound is 0: address(list[k]) = base + k * element_size
General case: address(list[k]) = base + (k - lb) * element_size
Compile-time descriptor contains: base address, size, bounds.

Accessing Multi-Dimensional Arrays

Represented in linear memory.
Row-major order: C, Java.
Column-major order: Fortran.
Address mapping: location(a[i,j]) = base + (((i - row_lb) * n) + (j - col_lb)) * element_size where n = number of elements per row.

Associative Arrays

An associative array is an unordered collection indexed by user-defined keys. Each element is a key-value pair.

Associative Array Design Issues

What is the form of references to elements?
Is the size static or dynamic?

Associative Array Support

Built-in support: Perl, Python, Ruby, Swift.
Standard libraries: Java (HashMap<k,v>), C++, C#, F#.
Python and Swift call them dictionaries.

Associative Arrays in Perl

Called hashes; use hash functions internally.
Variable names begin with %.
Keys: strings; Values: scalars (numbers, strings, references).
Literal assignment example: %salaries = ("Gary" => 75000, "Perry" => 57000, "Mary" => 55750, "Cedric" => 47850);
Access value by key: $salaries{"Perry"} = 58850;
Remove an entry: delete $salaries{"Gary"};

List Types

Lists are fundamental in functional languages (Lisp, Scheme, ML).

Lisp uses parentheses: '(A B C)

List Operations

CAR: returns first element
CDR: returns tail
CONS: adds new head

List Support in Languages

ML & F#: [1,2,3]; cons using '::'
Python: lists are mutable and heterogeneous
- List comprehension: [x*x for x in range(10) if x%2 == 0]
Haskell: [x*x | x <- [1..10]]
Java/C#: List, ArrayList