Beginning HBase…

(Please note — This is not a pure work by me, rather, assimilated from various sources over the internet. I am completely newbie in this, and just making an effort to prepare some personal notes here.)

I just started tinkering with HBase and understand its various architectural components like:

1) HMaster – is a lightweight process, responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. It assigns Regions to RegionServers for load balancing.

2) HRegion — aaa

3) HRegionServer — is responsible for serving and managing regions.

4) Quorum

5) ZooKeeper

6) Memstore

7) HFile — are the actual storage files to store HBase’s data fast and efficiently. Previously, Hadoop’s MapFile was used in HBase but didn’t prove good enough performance wise.

Important points:

1) In standalone mode:
a. HBase runs all HBase daemons and a local ZooKeeper all in the same JVM.
b. HBase doesn’t use HDFS but the local filesystem instead.

For fully distributed deployments, ZooKeeper runs as a separate service.

2) The HBase Client automatically handles communicating with ZooKeeper and finding the relevant RegionServer with which to interact.

3) Ensure correct port nos. as below –

/hadoop-2.6.0/etc/hadoop/core-site.xml
 fs.default.name
 hdfs://localhost:9000

/hbase-install/hbase-0.98.9-hadoop2/conf/hbase-site.xml
 hbase.rootdir
 hdfs://localhost:9000/hbase

4) After Hadoop (start-hdfs.sh), Yarn (start-yarn.sh) and HBase (start-hbase.sh) services were started, I found the below using JPS (JVM Process Status) :

hduser@localhost:~/yarn/hbase-install/hbase-0.98.9-hadoop2/conf$ jps
5147 Jps
3196 NameNode
3729 SecondaryNameNode
3957 ResourceManager
3425 DataNode
4619 HMaster
4171 NodeManager

But, I was expecting 2 more daemon services – HQuorumPeer and HRegionServer – which unfortunately didn’t start.

… work still in progress …

References:

1) HBase Reference Guide
2) HBase Architecture – Lars George
3) HBase Architecture – Sreejith
4) HBase Architecture – Altamira
5) RegionServer and DataNodes in HBase

BigData: interesting articles…

 
1. Tips for landing a job in the big data industry
2. Is healthcare ready for Big Data?

 

Miscellaneous…

Storage

1. SSD v/s HDD
2. SAN v/s NAS
3.
4.
5.

 

Architecture/Data Model

1. Facebook’s architecture
2. Quora v/s Facebook architecture
3.
4.
5.

 

NoSQL databases -

What is NoSQL (database) ???

Wikipedia says…
Couchbase says…
MongoDB says…
Planet Cassandra says…
TechRepublic says…
Oracle 12c says…

List of NoSQL databases

Official docs
HBase documentation
The MongoDB 2.6 manual
Cassandra 2.1 documentation

 

NoSQL Bookmarks

1. Eventual consistency & NoSQL
2. Difference between Horizontal and Vertical Scaling — Stack Overflow
3. Horizontal and Vertical Scaling — Blake Smith
4. Scale-Out v/s Scale-Up — Nati Shalom
5. Schema-on-Read v/s Schema-on-Write
6. Schema-on-Read — Techopedia
7. Cassandra v/s HBase — Info World
8.
9.

Let’s hadoop…

This is for the first time I have seriously started pondering about Apache Hadoop and its ecosystem components like – Pig, Hive, HCatalog, HBase, ZooKeeper, Oozie, Sqoop, Flume, Mahout, etc. etc. etc.

 
 

Hadoop/BigData Bookmarks —

1. Hadoop — Yahoo Developer Network
2. Hadoop Internals — Emilio Coppa
3. HDFS architecture — Apache
4. Map Reduce a simple introduction — Kaushik Sathupadi
5. Basic introduction to Apache Hadoop — HortonWorks
6. Configure Replication Factor and Block Size for HDFS
7.
8.
 
 
Hadoop ecosystem —

1. Apache Spark v/s Storm — StackOverflow
2. Apache Spark v/s Map Reduce — StackOverflow
3.
4.
5.

 
 

Magazine —
1. Innovation Enterprise magazines
2.
3.

 
 

Silver Pockets Full -

Yesterday, I came across Chet Justice’s blog post on the phenomenon of “Silver Pockets Full”. Interesting!!!
It claims that – a month with 5 Fridays and 5 Saturdays and 5 Sundays – occurs once in every 823 years.

So, the problem statement was to find out the correctness of this supposition. And, also to find out the next months which satisfies this criteria.

I believe, it can be done through any language (technical, of course) be it – SQL or Java or PERL or Python or etc. I prefer SQL and got a query ready.

Let’s try it out.

with t1 as
(
	select trunc(sysdate,'yyyy')+level-1 lvl
	from dual
	connect by level <= 50000
)
---
, t2 as
(
	select 
		to_char(lvl,'yyyymm') mx, 
		sum(decode(to_char(lvl,'dy'),'sun',1,'sat',1,'fri',1,0)) count_x
	from t1
	group by to_char(lvl,'yyyymm')
)
---
select to_char(to_date(mx,'yyyymm'),'yyyy-Month') yyyy_Month
from t2
where count_x = 15
order by mx;

Output:

YYYY_Month
------------
2014-August
2015-May
2016-January
2016-July
2017-December
2019-March
2020-May
2021-January
2021-October
2022-July
2023-December
2024-March
2025-August
2026-May
2027-January
2027-October
2028-December

... rest removed for brevity

138 rows selected 

Oracle database bookmarks -

General Oracle db concepts -

1. Why there is implicit commit before and after executing DDL Statements
2. Implicit commit with DDLs — AskTom
3. Why not use Redo log for consistent read
4. Difference between redo logs and undo tablespace
5. What does redo log files hold?
6. Why Hash Join goes slower?
7. TKPROF: high SQL*Net message from client + excessive parsing
8. What is the meaning of SQL*Net message from client — Ask Tom
9. Will Oracle lock the whole table while performing a DML statement or just the row? — Stack Overflow
10. Developing applications in Oracle APEX
11. What is High Water Mark (HWM) ?
12. Concept of High Water Mark (HWM) — Ask Tom
13. SQL Hints – Oracle 11.1 docs
14. Do I need to create indexes on foreign keys? — Stack Overflow
15.
16.
17.

 
 SQL and PL/SQL – 

1. Query – merging intervals — Aketi Jyuuzou
2. Group By preserving the order — Aketi Jyuuzou
3. Counting continuous years — using Tabibitosan method
4. Tabibitosan method — Aketi Jyuuzou
5. REGEXP_REPLACE – Is this strange Or Am I Missing something? — Jeneesh
6.

 

Follow

Get every new post delivered to your Inbox.