(Please note — This is not a pure work by me, rather, assimilated from various sources over the internet. I am completely newbie in this, and just making an effort to prepare some personal notes here.)
I just started tinkering with HBase and understand its various architectural components like:
1) HMaster — is a lightweight process, responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. It assigns Regions to RegionServers for load balancing.
2) HRegion — aaa
3) HRegionServer — is responsible for serving and managing regions.
7) HFile — are the actual storage files to store HBase’s data fast and efficiently. Previously, Hadoop’s MapFile was used in HBase but didn’t prove good enough performance wise.
1) In standalone mode:
a. HBase runs all HBase daemons and a local ZooKeeper all in the same JVM.
b. HBase doesn’t use HDFS but the local filesystem instead.
For fully distributed deployments, ZooKeeper runs as a separate service.
2) The HBase Client automatically handles communicating with ZooKeeper and finding the relevant RegionServer with which to interact.
3) Ensure correct port nos. as below –
/hadoop-2.6.0/etc/hadoop/core-site.xml fs.default.name hdfs://localhost:9000 /hbase-install/hbase-0.98.9-hadoop2/conf/hbase-site.xml hbase.rootdir hdfs://localhost:9000/hbase
4) After Hadoop (start-hdfs.sh), Yarn (start-yarn.sh) and HBase (start-hbase.sh) services were started, I found the below using JPS (JVM Process Status) :
hduser@localhost:~/yarn/hbase-install/hbase-0.98.9-hadoop2/conf$ jps 5147 Jps 3196 NameNode 3729 SecondaryNameNode 3957 ResourceManager 3425 DataNode 4619 HMaster 4171 NodeManager
But, I was expecting 2 more daemon services – HQuorumPeer and HRegionServer – which unfortunately didn’t start.
… work still in progress …