Category Archives: Uncategorized

Installing Python packages…

This is my first post on Python. So, kinda glad about it.

Actually, I didn’t want to put this as a post, and not at all with this very content. But finally decided to jot it before I forget.

So, first things first…

To install a Python package on v3.4.2, from Windows 7 (Professional) command prompt –

#1  Installing “httplib2” package

C:\Users\Energy>py -m pip install httplib2
Downloading/unpacking httplib2
Running setup.py (path:C:\Users\Energy\AppData\Local\Temp\pip_build_Energy\httplib2\setup.py) egg_info for package httplib2

Installing collected packages: httplib2
Running setup.py install for httplib2

Successfully installed httplib2
Cleaning up…

#2  Installing BeautifulSoup package

C:\Users\Energy>py -m pip install beautifulsoup4
Downloading/unpacking beautifulsoup4
Installing collected packages: beautifulsoup4
Successfully installed beautifulsoup4
Cleaning up…

 

Beginning HBase…

(Please note — This is not a pure work by me, rather, assimilated from various sources over the internet. I am completely newbie in this, and just making an effort to prepare some personal notes here.)

I just started tinkering with HBase and understand its various architectural components like:

1) HMaster — is a lightweight process, responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. It assigns Regions to RegionServers for load balancing.

2) HRegion — aaa

3) HRegionServer — is responsible for serving and managing regions.

4) Quorum

5) ZooKeeper

6) Memstore

7) HFile — are the actual storage files to store HBase’s data fast and efficiently. Previously, Hadoop’s MapFile was used in HBase but didn’t prove good enough performance wise.

Important points:

1) In standalone mode:
a. HBase runs all HBase daemons and a local ZooKeeper all in the same JVM.
b. HBase doesn’t use HDFS but the local filesystem instead.

For fully distributed deployments, ZooKeeper runs as a separate service.

2) The HBase Client automatically handles communicating with ZooKeeper and finding the relevant RegionServer with which to interact.

3) Ensure correct port nos. as below –

/hadoop-2.6.0/etc/hadoop/core-site.xml
 fs.default.name
 hdfs://localhost:9000

/hbase-install/hbase-0.98.9-hadoop2/conf/hbase-site.xml
 hbase.rootdir
 hdfs://localhost:9000/hbase

4) After Hadoop (start-hdfs.sh), Yarn (start-yarn.sh) and HBase (start-hbase.sh) services were started, I found the below using JPS (JVM Process Status) :

hduser@localhost:~/yarn/hbase-install/hbase-0.98.9-hadoop2/conf$ jps
5147 Jps
3196 NameNode
3729 SecondaryNameNode
3957 ResourceManager
3425 DataNode
4619 HMaster
4171 NodeManager

But, I was expecting 2 more daemon services – HQuorumPeer and HRegionServer – which unfortunately didn’t start.

… work still in progress …

References:

1) HBase Reference Guide
2) HBase Architecture – Lars George
3) HBase Architecture – Sreejith
4) HBase Architecture – Altamira
5) RegionServer and DataNodes in HBase