NoSQL Document and Column Databases: Performance and Use Cases

Posted on Jan 10, 2026 in Computers

NoSQL Document and Column Databases: Key Differences

NoSQL databases are broadly categorized into different types, with document-based and column-based models among the most widely used. Document-based databases, such as MongoDB, store data in flexible, self-describing documents—typically in JSON or BSON format—allowing complex, nested structures and varying fields for different records within the same collection. This flexibility supports agile development and easy mapping to application objects, making document stores ideal for applications with evolving data requirements.

In contrast, column-based databases, such as Apache Cassandra, organize data into rows and columns within column families, emphasizing fast write and read performance for large-scale, distributed datasets. Unlike document models, column stores optimize queries across specific columns rather than entire documents, which makes them highly efficient for analytical workloads or time-series data. The key difference, therefore, lies in data organization and access patterns: document models prioritize flexibility and hierarchical data representation, while column models prioritize scalability and high-performance columnar querying across distributed nodes. This distinction directly influences how developers design schemas, manage data growth, and optimize performance for specific application needs.

Relational Algebra Expression (a.i)

(a) i. π S.sname (σ B.color = ‘red’ ((S ⋈ S.sid = R.sid R) ⋈ R.bid = B.bid B))

π S.sname
σ B.color = 'red'
⋈ R.bid = B.bid
   /    \ 
⋈ S.sid = R.sid  B
  /    \ 
 S      R

i. Simple Nested Loops Join (tuple-oriented)

Formula:

For tuple-oriented nested loops join (each tuple of R compared to all tuples of S):

Cost = (number of tuples in R) × (number of pages of S)

Total tuples in R = 10,000
Pages of S = 200

YxWENPRgYpN+DtGHIX2DgH2N3aRNMLHbcAAAAAElFTkSuQmCC

ii. Page-Oriented Nested Loops Join

In a page-oriented nested loops join, each page of R is compared with all pages of S:

Cost = (number of pages of R) + (number of pages of R) × (number of pages of S)

R_pages = 1,000 S_pages = 200

pwpof4xnwDLgy8A9ObedBZX1igQAAAABJRU5ErkJggg==

iii. Block-Nested Loops Join

Block-nested loops join uses B−2 buffer pages for R blocks:

Cost = ceil(R_pages / (B−2)) × S_pages + R_pages

B = 52 → B−2 = 50 pages for R
R_pages = 1,000
S_pages = 200

iv. Index Nested Loops Join (B+ Tree on S)

B+ tree height = 3 → 3 I/Os to reach a leaf
S has 2,000 tuples, 200 pages
R has 10,000 tuples

Cost = number of pages of R + (number of tuples of R × cost to find matching tuple(s) in S using index)

Index cost per lookup = height + 1 page for leaf data → 3 + 1 = 4 I/Os per tuple

BeHJCjuTEtoAAAAASUVORK5CYII=
Each tuple of R uses the B+ tree index on S to find a matching tuple. The index traversal costs 3 I/Os (height) and 1 page to read the leaf data. Multiply by all 10,000 tuples in R, then add the cost to read R’s pages.

a) Find all book titles published after 1995

FLWOR Expression:

for $b in doc("books.xml")/bib/book
where xs:integer($b/@year) > 1995
return $b/title

Expected Output:

NoSQL Document and Column Databases: Performance and Use Cases

NoSQL Document and Column Databases: Key Differences

Relational Algebra Expression (a.i)

i. Simple Nested Loops Join (tuple-oriented)

ii. Page-Oriented Nested Loops Join

iii. Block-Nested Loops Join

iv. Index Nested Loops Join (B+ Tree on S)

a) Find all book titles published after 1995

Data on the Web

The Digital TV

Recent Notes

Subjects

Publicidad