There is a significant set of applications and frameworks that we utilize. Web servers, caches, databases, resource managers, distributed processing systems, full text search indexes. They all have to be redundant to make the system highly available. Failure of one or even several servers, network outage, death of a process, neither of that should be a source of the overall downtime.
MySQL, the database system that we use as our main data storage, has long been a single point of failure in the system. If the database process goes down or hardware on the server fails, our platform is seriously harmed by this. It takes time to react and handle the problem. Worst of all, while frontend is affected, we risk losing money. We wanted to make our data store redundant and highly available, that is why we tried Apache Phoenix.
If you already use Hadoop and HBase, Phoenix is a good extension. It provides the well-known SQL query language, that is much more convenient than HBase shell language or HBase Java API. A primary benefit is that it runs all queries in parallel on all region servers. As a result, HBase works faster and more efficiently. It is easy to install, because it’s just a library and doesn’t require deployment. It is compatible with the JDBC driver so that migration from regular relational database system doesn’t require major code changes.
These are some of the features that Phoenix provides:
hadoop-2.7.1 hbase-0.98 phoenix-4.5.0 7 HBase region servers 23 GB of data split across different tables and schemas.
Then we found out that Phoenix requires some syntax changes, which have been documented in the table below. Syntax differences between Phoenix and MySQL:
|UPSERT INTO t (col) VALUES (val);||INSERT INTO t (col) VALUES (val);|
|UPSERT INTO t (PK, col1) SELECT PK, ‘val1’ FROM t WHERE col2 = ‘val2’; (*)||UPDATE t SET col1 = ‘val1’ WHERE col2 = ‘val2’;|
|CREATE TABLE t (id BIGINT(20) UNSIGNED NOT NULL); CREATE SEQUENCE t_seq; UPSERT INTO t (col) VALUES (NEXT VALUE FROM t_seq);||CREATE TABLE t (id BIGINT(20) UNSIGNED NOT NULL AUTO-INCREMENT); INSERT INTO t (col) VALUES (NULL);|
|UPSERT INTO t (date_col) SELECT TO_DATE(‘some_date’) FROM t; (**)||UPDATE t SET date_col = ‘some_date’;|
|SELECT DATE(NOW()) – 7 FROM t;||SELECT TIMESTAMPADD(DAY, -7, DATE(NOW())) FROM t;|
|SELECT NOW() – 4.0 / 24.0 FROM t; (***)||SELECT DATE_SUB(NOW(), INTERVAL 4 HOUR) FROM t;|
|SELECT (NOW() – date_col)*24.0 FROM t;||SELECT HOUR(TIMEDIFF(NOW(), date_col)) FROM t;|
|SELECT * FROM t WHERE LENGTH(REGEXP_SUBSTR (col, ‘regex’)) > 0;||SELECT * FROM t WHERE col RLIKE ‚regex‘;|
|SELECT * FROM t WHERE col IS NULL;||SELECT * FROM t WHERE ISNULL(col); SELECT * FROM t WHERE col IS NULL;|
The first query was executed on a table containing 20 million rows. As you can see, the performance of Phoenix when using indexes is much better than the results obtained with MySQL. You can notice that we tried different index types (global index vs local index) to achieve better performance. Nevertheless, the strategy about how to choose which type to use was not so clear.
For the second query we can see again that Phoenix is much faster than MySQL, even without indexes. However, we started to see some problems when we tried running join queries. First of all, some joins would not work with the default settings. The problem being that Phoenix by default uses hash-join algorithm which is faster than the alternative sort merge join algorithm, but it requires enough memory for the hash cache that it uses. in order to fix this situation we tried two options:
One example of this is secondary indexes. When used for filter search, they reduce full-scan queries to instant lookup or range scan. This is a frequent use case in our system. Сounting all records or subset of them in the table is another important use case and Phoenix does this really well.
All in all, the SQL language is much easier to use than HBase shell language. This is useful when we need to get insight into the data during testing or while investigating problems.*This article was created by Lucia Pasarin Perea & Viacheslav Inozemtsev.