HBase Code Examples and Concepts

Posted on May 6, 2024 in Computers

HBase Code Examples

Creating a Table

This code demonstrates how to create a table in HBase with two column families:

private void createTable() throws IOException {  byte[] table = Bytes.toBytes("User");  byte[] cf1 = Bytes.toBytes("PersonalData");  byte[] cf2 = Bytes.toBytes("LoginData");  HTableDescriptor hTable = new HTableDescriptor(table);  HColumnDescriptor family1 = new HColumnDescriptor(cf1);  family1.setMaxVersions(10);  hTable.addFamily(family1);  HColumnDescriptor family2 = new HColumnDescriptor(cf2);  family2.setMaxVersions(10);  hTable.addFamily(family2);  this.hBaseAdmin.createTable(hTable);}

Retrieving the Last 3 Logins for a User

This code retrieves the timestamps of the last three logins for a given user ID:

private void getTImeThreeLastLogins(String userId) throws IOException {  byte[] table = Bytes.toBytes("Users");  byte[] cf = Bytes.toBytes("LoginData");  byte[] columnLogin = Bytes.toBytes("LoginTime");  HTable htable = new HTable(config, table);  byte[] key = Bytes.toBytes(userId);  Get get = new Get(key);  get.addColumn(cf, columnLogin);  get.setMaxVersions(3);  Result res = hTable.get(get);  List<Cell> valuesLoginTime = res.getColumnCells(cf, columnLogin);  for (int i = 0; i < valuesLoginTime.size(); i++) {    Long loginTime = Bytes.toLong(CellUtil.cloneValue(valuesLoginTime.get(i)));    System.out.println("Login time: " + loginTime);  }}

Querying Users from Zaragoza within a Name Range

This code retrieves users from Zaragoza whose names fall within a specified range:

private void query() throws IOException {  byte[] cf = Bytes.toBytes("BasicData");  byte[] column = Bytes.toBytes("PROVINCE");  byte[] column2 = Bytes.toBytes("LAST_LOGIN");  HTable hTable = new HTable(config, table);  byte[] startKey = Bytes.toBytes("JUAN");  byte[] endKey = Bytes.toBytes("PEPE");  Scan scan = new Scan(startKey, endKey);  Filter f = new SingleColumnValueFilter(cf, column, CompareFilter.CompareOp.EQUAL, Bytes.toBytes("ZARAGOZA"));  scan.setFilter(f);  ResultScanner rs = hTable.getScanner(scan);  Result res = rs.next();  while (res != null && !res.isEmpty()) {    String key = Bytes.toString(res.getRow());    String province = Bytes.toString(res.getValue(cf, column));    String last_login = Bytes.toString(res.getValue(cf, column2));    System.out.println(key + province + last_login);    res = rs.next();  }}

Additional Code Examples

Generating Keys: Demonstrates how to generate keys based on date and vendor ID.
Querying Trips by Payment Type: Retrieves trips within a specific time period that were paid with a certain payment type.
Querying Users by Last Name and Year: Finds users with a specific last name who logged in during a particular year.
Checking User Existence: Verifies if a user with a given name and ID exists.
Counting Users: Counts the total number of users in the table.
Querying Users by Last Name: Retrieves all users with a specific last name.
Querying Users by Last Name and Province: Finds users with a specific last name and province.
Retrieving Last Sessions: Gets the last few session timestamps for a user with a given last name.
Put Operation: Demonstrates how to insert data into a table.
Get Operation: Shows how to retrieve data from a table.
Delete Row: Illustrates how to delete a row from a table.
Drop Table: Explains how to drop a table in HBase.

HBase and Bigtable Concepts

Bigtable Overview

Bigtable is a distributed, persistent, multi-dimensional sorted map designed for large-scale data storage and retrieval. It offers:

Sparse data structure: Rows can have different columns.
Distribution across nodes for scalability.
Durability through GFS (Google File System).
Multiple values per row and column (versions).
Lexicographically sorted keys for efficient access.

Key concepts in Bigtable include:

Row Key: A unique identifier for each row, typically 10-100 bytes long.
Tablet: A unit of distribution and load balancing, consisting of a range of rows.
Column Family: A group of semantically related columns.
Cell: The intersection of a row, column, and timestamp, containing a value.

HBase Architecture

HBase is an open-source implementation of the Bigtable model. Key components include:

HBase Master: Manages cluster configuration, region assignment, and load balancing.
Region Server: Serves data for a set of regions.
Root and Meta Tables: System tables used to locate regions and user tables.
HLog: A write-ahead log for data durability.

Data Operations

HBase supports various data operations:

Get: Retrieves a single row.
Scan: Reads multiple rows, often with filters.
Put: Inserts or updates data.
Delete: Marks cells or rows for deletion.

Coprocessors

Coprocessors allow running user code on Region Servers, enabling custom logic and data processing close to the data.

Read and Write Operations

HBase read and write operations involve interactions with Zookeeper, the .META table, and Region Servers. Data is written to MemStore and eventually flushed to HFiles for persistence.

Region Splitting

Regions are split automatically or manually to maintain performance and manage data growth.

Column Families vs. Relational Databases

Column families offer advantages for sparse data and selective retrieval, while relational databases excel in structured data and complex queries.

Region Creation

Regions are created manually (pre-splitting) or automatically (auto-splitting) based on size thresholds.

Data Access and Deletion

Clients can access data without contacting the HBase Master if the location is known. Deleted cells are marked and eventually removed during major compactions.

Structure of Bigtable/HBase

Bigtable/HBase is a multi-dimensional map indexed by row key, column key, and timestamp. Data is stored in lexicographic order by row key, and tables are split into regions for scalability.

This document provides a basic understanding of HBase code examples and key concepts related to Bigtable and HBase architecture, data operations, and region management.

HBase Code Examples and Concepts