NoSQL Document and Column Databases: Performance and Use Cases
NoSQL Document and Column Databases: Key Differences
NoSQL databases are broadly categorized into different types, with document-based and column-based models among the most widely used. Document-based databases, such as MongoDB, store data in flexible, self-describing documents—typically in JSON or BSON format—allowing complex, nested structures and varying fields for different records within the same collection. This flexibility supports agile development and easy mapping to application objects, making document stores ideal for applications with evolving data requirements.
In contrast, column-based databases, such as Apache Cassandra, organize data into rows and columns within column families, emphasizing fast write and read performance for large-scale, distributed datasets. Unlike document models, column stores optimize queries across specific columns rather than entire documents, which makes them highly efficient for analytical workloads or time-series data. The key difference, therefore, lies in data organization and access patterns: document models prioritize flexibility and hierarchical data representation, while column models prioritize scalability and high-performance columnar querying across distributed nodes. This distinction directly influences how developers design schemas, manage data growth, and optimize performance for specific application needs.
Relational Algebra Expression (a.i)
(a) i. π S.sname (σ B.color = ‘red’ ((S ⋈ S.sid = R.sid R) ⋈ R.bid = B.bid B))
π S.sname σ B.color = 'red' ⋈ R.bid = B.bid / \ ⋈ S.sid = R.sid B / \ S R
i. Simple Nested Loops Join (tuple-oriented)
Formula:
For tuple-oriented nested loops join (each tuple of R compared to all tuples of S):
Cost = (number of tuples in R) × (number of pages of S)
- Total tuples in R = 10,000
- Pages of S = 200
ii. Page-Oriented Nested Loops Join
In a page-oriented nested loops join, each page of R is compared with all pages of S:
Cost = (number of pages of R) + (number of pages of R) × (number of pages of S)
- R_pages = 1,000 S_pages = 200
iii. Block-Nested Loops Join
Block-nested loops join uses B−2 buffer pages for R blocks:
Cost = ceil(R_pages / (B−2)) × S_pages + R_pages
- B = 52 → B−2 = 50 pages for R
- R_pages = 1,000
- S_pages = 200
iv. Index Nested Loops Join (B+ Tree on S)
- B+ tree height = 3 → 3 I/Os to reach a leaf
- S has 2,000 tuples, 200 pages
- R has 10,000 tuples
Cost = number of pages of R + (number of tuples of R × cost to find matching tuple(s) in S using index)
- Index cost per lookup = height + 1 page for leaf data → 3 + 1 = 4 I/Os per tuple
Each tuple of R uses the B+ tree index on S to find a matching tuple. The index traversal costs 3 I/Os (height) and 1 page to read the leaf data. Multiply by all 10,000 tuples in R, then add the cost to read R’s pages.
a) Find all book titles published after 1995
FLWOR Expression:
for $b in doc("books.xml")/bib/book
where xs:integer($b/@year) > 1995
return $b/titleExpected Output:
