Programming Language Data Types Explained
Introduction to Data Types
A data type defines a collection of values and operations on those values. A descriptor is a set of attributes of a variable. An object is an instance of an abstract data type (user-defined or built-in). A key design issue for all data types is: What operations are provided for variables, and how are they specified?
Primitive Data Types Explained
- Primitive types are not defined in terms of other types.
- They often mirror hardware data types for performance and compatibility.
Integer Types
- Common primitive numeric type.
- Maps directly to hardware.
- Example sizes in Java:
byte
,short
,int
,long
.
Floating Point Types
- Approximations of real numbers.
- Common types:
float
,double
. - IEEE 754 standard format is widely used.
Complex Number Types
- Supported by C99, Fortran, Python.
- Represented as two floats: (real + imag*j) e.g., (7 + 3j).
Decimal Types
- Used in business applications.
- Stores a fixed number of decimal digits using Binary Coded Decimal (BCD).
- Precise for money; limited range; memory inefficient.
- Supported by COBOL, C#, F#.
Boolean Types
- Two values:
true
orfalse
. - Stored as a byte or bit (bit usually inaccessible).
Character Types
- Stored as a numeric code (ASCII, Unicode).
- Unicode UCS-2 (16-bit) and UCS-4 (32-bit) for global language support.
- Supported by Java, JavaScript, Python, C#, C++.
Character String Types
A string is a sequence of characters that may or may not be a primitive type.
Key Design Issues for Strings
- Should strings be a primitive type or a special kind of array?
- Should the length be static, limited dynamic, or dynamic?
String Length Types
- Static Length: Fixed at declaration (e.g., Java
char[]
, COBOL). - Limited Dynamic Length: Allows length change up to a limit (e.g., C
char[]
). - Dynamic Length: Can change at runtime (e.g., JavaScript, Python, Perl).
Typical String Operations
- Assignment, comparison, concatenation, substring reference, pattern matching.
String Support in Languages
- C and C++:
- Strings are not primitive types.
- C uses null-terminated char arrays; standard operations from
<string.h>
. - C++ supports both C-style strings and the
std::string
class.
- Python:
- Strings are primitive, immutable types.
- Built-in operators for concatenation (
+
), repetition (*
), slicing, and pattern matching.
- Java:
- Strings are objects of the
String
class (immutable). StringBuffer
andStringBuilder
are mutable alternatives.- Methods:
.substring()
,.indexOf()
,.compareTo()
,.matches()
(regex).
- Strings are objects of the
- C# and Ruby:
- Include string classes with rich libraries, similar to Java.
- String operations include insertion, deletion, regex, slicing.
Many languages provide strong pattern matching support:
- C++ (via regex libraries)
- Java (
java.util.regex
) - Python (
re
module) - JavaScript (
RegExp
) - C# (.NET Regex)
- Ruby (built-in
=~
,//
)
String Implementation Details
- Static strings: Simple descriptors (pointer + fixed length).
- Dynamic strings: Descriptors must support reallocation and track current size.
- Languages may use reference counting or garbage collection to manage string memory.
Evaluating String Types
- Primitive strings simplify compiler design and allow optimizations.
- Dynamic strings enhance flexibility but increase runtime overhead and complexity.
Implementing String Types
Static Length Strings
- Require a compile-time descriptor.
- Descriptor stores the maximum size and memory address.
Limited Dynamic Strings
- Require a run-time descriptor.
- Must track current length, maximum allowed length, and memory location.
- Examples: C-style strings with buffer limits.
Dynamic Length Strings
- Require a simpler run-time descriptor than limited dynamic strings.
- Descriptor typically stores just current length and pointer.
- Require complex storage management to support reallocation at runtime.
- Garbage collection or manual memory management is essential to avoid leaks.
- String representation may vary across compilers and runtimes.
- Tradeoffs involve speed (static is fastest) vs flexibility (dynamic allows resizing).
Enumeration Types
All possible values are listed as named constants. Example: enum colors {RED, GREEN, BLUE};
Design Questions for Enums
- Can enums be reused across types?
- Should coercion to/from integers be allowed?
Advantages of Enumerations
- Improves program readability and reliability.
- Prevents illegal values from being assigned.
C++ allows arithmetic ops on enums; C# and Java do not.
Evaluating Enumerated Types
- Improves program readability: Developers don’t need to remember or assign numeric codes to meaningful values (e.g., colors, days).
- Improves program reliability:
- Compiler checks ensure only valid enum values are used.
- Disallows arithmetic operations on enum types (e.g., cannot add two days together).
- Prevents assigning values outside of defined range.
Enum Support in Languages
- C++ allows enum variables to be treated as integers (coercion).
- C#, F#, Swift, and Java 5.0+ provide stronger type-checking:
- Enum types are not implicitly coerced into integers.
- More reliable enforcement of enum constraints at compile-time.
These features help prevent bugs and improve code correctness.
Array Types
An array is a homogeneous aggregate of elements identified by position relative to the first. All array elements must be of the same type.
Array Design Issues
- What types are legal for subscripts?
- Are subscript expressions range checked?
- When are subscript ranges bound?
- When does array allocation take place?
- Are ragged or rectangular multidimensional arrays allowed?
- Can arrays be initialized at the time of storage allocation?
- What kinds of slices are supported?
Array Indexing
- Indexing is a mapping from indices to elements.
- Syntax: Fortran and Ada use parentheses
()
; most others use brackets[]
. - Ada uses parentheses to emphasize that arrays and function calls are both mappings.
Index Types and Range Checking
- Most common index types: integers.
- FORTRAN, C, Java: integer types only.
- Range checking improves reliability.
- Not specified: C, C++, Perl, Fortran.
- Enforced: Java, ML, C#.
Array Categories by Binding
- Static Arrays:
- Subscript ranges and storage are fixed before runtime.
- Pros: very efficient.
- Cons: memory cannot be reused.
- Fixed Stack-Dynamic Arrays:
- Subscript ranges bound before runtime, allocation at declaration.
- Pros: space-efficient.
- Cons: requires runtime management.
- Fixed Heap-Dynamic Arrays:
- Subscripts and storage bound at user request, allocated from heap.
- Pros: fits exact size requirements.
- Cons: slower heap allocation.
- Heap-Dynamic Arrays:
- Fully dynamic subscripts and allocation, may change multiple times.
- Pros: most flexible.
- Cons: higher overhead.
Array Examples in Languages
- C/C++:
- With ‘static’: static arrays.
- Without ‘static’: fixed stack-dynamic.
- Java and C#: fixed heap-dynamic arrays.
- Perl, Python, Ruby, JS: heap-dynamic arrays.
Array Initialization
C, C++, Java, Swift, and C# support inline initialization.
- Examples:
int list[] = {4, 5, 7, 83};
char name[] = "Freddie";
String[] names = {"Bob", "Jake", "Darcie"};
Array Operations
- Typical operations: assignment, concatenation, comparison, slicing.
- C-style: no built-in operations, handled via libraries.
- APL: most extensive built-in array/vector/matrix ops.
- Python: lists (dynamic arrays) support extensive operations.
- Ruby: supports array concatenation.
Implementing Arrays
- Requires more compile-time logic than primitives.
- Element access logic must be generated at compile time and evaluated at runtime.
Accessing 1D Arrays
- If lower bound is 0:
address(list[k]) = base + k * element_size
- General case:
address(list[k]) = base + (k - lb) * element_size
- Compile-time descriptor contains: base address, size, bounds.
Accessing Multi-Dimensional Arrays
- Represented in linear memory.
- Row-major order: C, Java.
- Column-major order: Fortran.
- Address mapping:
location(a[i,j]) = base + (((i - row_lb) * n) + (j - col_lb)) * element_size
wheren
= number of elements per row.
Associative Arrays
An associative array is an unordered collection indexed by user-defined keys. Each element is a key-value pair.
Associative Array Design Issues
- What is the form of references to elements?
- Is the size static or dynamic?
Associative Array Support
- Built-in support: Perl, Python, Ruby, Swift.
- Standard libraries: Java (
HashMap<k,v>
), C++, C#, F#. - Python and Swift call them dictionaries.
Associative Arrays in Perl
- Called hashes; use hash functions internally.
- Variable names begin with
%
. - Keys: strings; Values: scalars (numbers, strings, references).
- Literal assignment example:
%salaries = ("Gary" => 75000, "Perry" => 57000, "Mary" => 55750, "Cedric" => 47850);
- Access value by key:
$salaries{"Perry"} = 58850;
- Remove an entry:
delete $salaries{"Gary"};
List Types
Lists are fundamental in functional languages (Lisp, Scheme, ML).
- Lisp uses parentheses:
'(A B C)
List Operations
- CAR: returns first element
- CDR: returns tail
- CONS: adds new head
List Support in Languages
- ML & F#:
[1,2,3]
; cons using'::'
- Python: lists are mutable and heterogeneous
- List comprehension:
[x*x for x in range(10) if x%2 == 0]
- List comprehension:
- Haskell:
[x*x | x <- [1..10]]
- Java/C#:
List
,ArrayList