HBase Code Examples and Concepts
HBase Code Examples
Creating a Table
This code demonstrates how to create a table in HBase with two column families:
private void createTable() throws IOException { byte[] table = Bytes.toBytes("User"); byte[] cf1 = Bytes.toBytes("PersonalData"); byte[] cf2 = Bytes.toBytes("LoginData"); HTableDescriptor hTable = new HTableDescriptor(table); HColumnDescriptor family1 = new HColumnDescriptor(cf1); family1.setMaxVersions(10); hTable.addFamily(family1); HColumnDescriptor family2 = new HColumnDescriptor(cf2); family2.setMaxVersions(10); hTable.addFamily(family2); this.hBaseAdmin.createTable(hTable);}
Retrieving the Last 3 Logins for a User
This code retrieves the timestamps of the last three logins for a given user ID:
private void getTImeThreeLastLogins(String userId) throws IOException { byte[] table = Bytes.toBytes("Users"); byte[] cf = Bytes.toBytes("LoginData"); byte[] columnLogin = Bytes.toBytes("LoginTime"); HTable htable = new HTable(config, table); byte[] key = Bytes.toBytes(userId); Get get = new Get(key); get.addColumn(cf, columnLogin); get.setMaxVersions(3); Result res = hTable.get(get); List<Cell> valuesLoginTime = res.getColumnCells(cf, columnLogin); for (int i = 0; i < valuesLoginTime.size(); i++) { Long loginTime = Bytes.toLong(CellUtil.cloneValue(valuesLoginTime.get(i))); System.out.println("Login time: " + loginTime); }}
Querying Users from Zaragoza within a Name Range
This code retrieves users from Zaragoza whose names fall within a specified range:
private void query() throws IOException { byte[] cf = Bytes.toBytes("BasicData"); byte[] column = Bytes.toBytes("PROVINCE"); byte[] column2 = Bytes.toBytes("LAST_LOGIN"); HTable hTable = new HTable(config, table); byte[] startKey = Bytes.toBytes("JUAN"); byte[] endKey = Bytes.toBytes("PEPE"); Scan scan = new Scan(startKey, endKey); Filter f = new SingleColumnValueFilter(cf, column, CompareFilter.CompareOp.EQUAL, Bytes.toBytes("ZARAGOZA")); scan.setFilter(f); ResultScanner rs = hTable.getScanner(scan); Result res = rs.next(); while (res != null && !res.isEmpty()) { String key = Bytes.toString(res.getRow()); String province = Bytes.toString(res.getValue(cf, column)); String last_login = Bytes.toString(res.getValue(cf, column2)); System.out.println(key + province + last_login); res = rs.next(); }}
Additional Code Examples
- Generating Keys: Demonstrates how to generate keys based on date and vendor ID.
- Querying Trips by Payment Type: Retrieves trips within a specific time period that were paid with a certain payment type.
- Querying Users by Last Name and Year: Finds users with a specific last name who logged in during a particular year.
- Checking User Existence: Verifies if a user with a given name and ID exists.
- Counting Users: Counts the total number of users in the table.
- Querying Users by Last Name: Retrieves all users with a specific last name.
- Querying Users by Last Name and Province: Finds users with a specific last name and province.
- Retrieving Last Sessions: Gets the last few session timestamps for a user with a given last name.
- Put Operation: Demonstrates how to insert data into a table.
- Get Operation: Shows how to retrieve data from a table.
- Delete Row: Illustrates how to delete a row from a table.
- Drop Table: Explains how to drop a table in HBase.
HBase and Bigtable Concepts
Bigtable Overview
Bigtable is a distributed, persistent, multi-dimensional sorted map designed for large-scale data storage and retrieval. It offers:
- Sparse data structure: Rows can have different columns.
- Distribution across nodes for scalability.
- Durability through GFS (Google File System).
- Multiple values per row and column (versions).
- Lexicographically sorted keys for efficient access.
Key concepts in Bigtable include:
- Row Key: A unique identifier for each row, typically 10-100 bytes long.
- Tablet: A unit of distribution and load balancing, consisting of a range of rows.
- Column Family: A group of semantically related columns.
- Cell: The intersection of a row, column, and timestamp, containing a value.
HBase Architecture
HBase is an open-source implementation of the Bigtable model. Key components include:
- HBase Master: Manages cluster configuration, region assignment, and load balancing.
- Region Server: Serves data for a set of regions.
- Root and Meta Tables: System tables used to locate regions and user tables.
- HLog: A write-ahead log for data durability.
Data Operations
HBase supports various data operations:
- Get: Retrieves a single row.
- Scan: Reads multiple rows, often with filters.
- Put: Inserts or updates data.
- Delete: Marks cells or rows for deletion.
Coprocessors
Coprocessors allow running user code on Region Servers, enabling custom logic and data processing close to the data.
Read and Write Operations
HBase read and write operations involve interactions with Zookeeper, the .META table, and Region Servers. Data is written to MemStore and eventually flushed to HFiles for persistence.
Region Splitting
Regions are split automatically or manually to maintain performance and manage data growth.
Column Families vs. Relational Databases
Column families offer advantages for sparse data and selective retrieval, while relational databases excel in structured data and complex queries.
Region Creation
Regions are created manually (pre-splitting) or automatically (auto-splitting) based on size thresholds.
Data Access and Deletion
Clients can access data without contacting the HBase Master if the location is known. Deleted cells are marked and eventually removed during major compactions.
Structure of Bigtable/HBase
Bigtable/HBase is a multi-dimensional map indexed by row key, column key, and timestamp. Data is stored in lexicographic order by row key, and tables are split into regions for scalability.
This document provides a basic understanding of HBase code examples and key concepts related to Bigtable and HBase architecture, data operations, and region management.