(Please note — This is not a pure work by me, rather, assimilated from various sources over the internet. I am completely newbie in this, and just making an effort to prepare some personal notes here.)
I just started tinkering with HBase and understand its various architectural components like:
1) HMaster — is a lightweight process, responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. It assigns Regions to RegionServers for load balancing.
2) HRegion — aaa
3) HRegionServer — is responsible for serving and managing regions.
7) HFile — are the actual storage files to store HBase’s data fast and efficiently. Previously, Hadoop’s MapFile was used in HBase but didn’t prove good enough performance wise.
1) In standalone mode:
a. HBase runs all HBase daemons and a local ZooKeeper all in the same JVM.
b. HBase doesn’t use HDFS but the local filesystem instead.
For fully distributed deployments, ZooKeeper runs as a separate service.
2) The HBase Client automatically handles communicating with ZooKeeper and finding the relevant RegionServer with which to interact.
3) Ensure correct port nos. as below –
4) After Hadoop (start-hdfs.sh), Yarn (start-yarn.sh) and HBase (start-hbase.sh) services were started, I found the below using JPS (JVM Process Status) :
But, I was expecting 2 more daemon services – HQuorumPeer and HRegionServer – which unfortunately didn’t start.
… work still in progress …
1) HBase Reference Guide
2) HBase Architecture – Lars George
3) HBase Architecture – Sreejith
4) HBase Architecture – Altamira
5) RegionServer and DataNodes in HBase