Walk Through of Installing Cloudera Manager on a Single Node

I am following the official guide of auto installation, while it seems easy at first glance. I had a rough time on it.

Before going on, CDH is considered kind of heavy distribution of hadoop. What I am doing is to use CDH as dev, and so far my experiences are (1) VMWorkstation on a 16G RAM windows host machine, with CDH taking up to 8G single node or (2) Mac 8G RAM host, with vagrant-enabled virtual box taking up to 4G and 2 vcores for a single node (more than one node will cause non-functional CDH)

Make sure if you are behind a proxy by enabling proxy as stated in the guide. In addition, once you are in the phase of downloading parcels, remember to configure parcel downloading proxy setting through web browser. Confirm on the downloading issue of parcel by checking ‘/var/log/cloudera-scm-server/cloudera-scm-server.log‘.

Don’t need to deal with java and just let it go with default oracle-j2sdk.

Disable IPv6 by following the blog guide.

Find ip address of the machine node and create FQDN as suggested in the comments. And Cloudera requires more and more:

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    A properly formatted /etc/hosts file should be similar to the following example:	localhost.localdomain	localhost	cluster-01.example.com	cluster-01	cluster-02.example.com	cluster-02	cluster-03.example.com	cluster-03

My intake is that 1) avoid using from loopback interface, using IPs assigned through eth0 interface. 2) make hostname is FQDN as well, by using `sudo hostname <FQDN>` and saving the name in `/etc/hostname` for reboot.

Allow host to be resolvable from local /etc/hosts files by this:

cat /etc/NetworkManager/dnsmasq.d/hosts.conf 

Disable firewall and iptable as in the guide.

CDH requires root access using password or private key. My take is using password for root user is easier. Do following,

Make sure openssh-server is installed and started

sudo apt-get install openssh-server

Give root a password, enable ssh root access

sudo passwd root;

vi /etc/ssh/sshd_config;

PermitRootLogin yes

service ssh restart

Test ssh access as root

ssh localhost

With above configuration, install cloudera manager and installation cluster should be working.

During cluster installation, if you need to retry from web browser, you may need to manually remove the lock by:

sudo rm /tmp/.scm_prepare_node.lock

If encountering any problem, you can always uninstall and get back to a clean state by following the uninstallation guide.

Note: during cluster installation, if the web browser does not show any progress bar, that means something wrong. Check the root access listed above.

======= after a running CDH, configurations to be continue ======
CDH calculates the settings (like memory location) for the host, but sometimes the configuration is not checked against the minimum requirement of installed components.

For example, the test installation with estimating PI does not work, unless increasing following memory settings in Yarn as (wired enough, the log in yarn does not point anything useful),

– Set the Container Memory (yarn.nodemanager.resource.memory-mb) to 4GB
– Set the Java Heap Size of ResourceManager to 1GB

The most useful way is to check the non-default settings by switching to the new view.

======= running mahout example ======
When trying to execute `mahout seq2sparse -i reuters-out-seqdir/ -o reuters-out-seqdir-lda -ow –maxDFPercent 85 –namedVector` with `MAHOUT_LOCAL` set to “something not null”, meaning running on local, guava library version mismatch.

Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

Mahout requires guava 16.0, while hadoop V2 uses guava 11.0.

The solution is quite wired. I was simply going to turn on the log by reading CDH 5’s mahout script, which is pointing mahout conf directory to /etc/mahout/conf.dist. In the conf directory, I put a simple log4j properties under it. Surprisingly, the guava problem is gone.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: