Monthly Archives: January 2015

Beginning HBase…

(Please note — This is not a pure work by me, rather, assimilated from various sources over the internet. I am completely newbie in this, and just making an effort to prepare some personal notes here.)

I just started tinkering with HBase and understand its various architectural components like:

1) HMaster — is a lightweight process, responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. It assigns Regions to RegionServers for load balancing.

2) HRegion — aaa

3) HRegionServer — is responsible for serving and managing regions.

4) Quorum

5) ZooKeeper

6) Memstore

7) HFile — are the actual storage files to store HBase’s data fast and efficiently. Previously, Hadoop’s MapFile was used in HBase but didn’t prove good enough performance wise.

Important points:

1) In standalone mode:
a. HBase runs all HBase daemons and a local ZooKeeper all in the same JVM.
b. HBase doesn’t use HDFS but the local filesystem instead.

For fully distributed deployments, ZooKeeper runs as a separate service.

2) The HBase Client automatically handles communicating with ZooKeeper and finding the relevant RegionServer with which to interact.

3) Ensure correct port nos. as below –



4) After Hadoop (, Yarn ( and HBase ( services were started, I found the below using JPS (JVM Process Status) :

hduser@localhost:~/yarn/hbase-install/hbase-0.98.9-hadoop2/conf$ jps
5147 Jps
3196 NameNode
3729 SecondaryNameNode
3957 ResourceManager
3425 DataNode
4619 HMaster
4171 NodeManager

But, I was expecting 2 more daemon services – HQuorumPeer and HRegionServer – which unfortunately didn’t start.

… work still in progress …


1) HBase Reference Guide
2) HBase Architecture – Lars George
3) HBase Architecture – Sreejith
4) HBase Architecture – Altamira
5) RegionServer and DataNodes in HBase


BigData: interesting articles…

1. Tips for landing a job in the big data industry
2. Is healthcare ready for Big Data?