Monthly Archives: December 2014

Learning Notes: Software Architecture – Taking A Deeper Dive

Trade Off and ATAM:

Fulfilling all stakeholder’s requirements are impossible

Agile ATAM: Design conceptual architecture -> Bring stakeholder to define scenarios -> prioritize stakeholder’s requirements -> help in negotiating the priority -> redesign -> ask relevant stakeholder representatives including programmers to negotiate/evalute -> evolve the design and involve stakeholders early and often (dealing with scenarios/techniques changes)

Further reading: Software Architecture in Practice 3rd Edition. SEI digital library.

Continuous Delivery:

Water Fall approach is not getting quick feedback, but do continuous delivery to get fast feedback by releasing new features constantly.

holistic engineering: friction, e.g. database schema from source code.??

Automation in your development, testing and deployment; use automated tools or developing your own tools, e.g. use selenium, cucumber-puppet.

When automating, do consider the time constrain by using a timer, and avoid shaving the yak. Good enough for now is better than perfect in future.

Differentiate aspects among static, dynamic and production, developing working software as soon as possible, building vertical slides over time,

hide new feature until it finishes: feature toggle over feature branch (sync as often as necessary to avoid conflicts- merge ambush), togglez platform, google chunk-based development

Each change should be small and releasable: users don’t like giant changes, but gradually changes like in Facebook; Canary Releasing using router to choose users to move to newer version;  dark releasing using timezone to gradually release like Facebook chat; strangler pattern: new changes obsoletes a small percent of old features and live together by allowing users to change to new feature by themselves, like cloudera’s switching to new configuration UI.

Walk Through of Installing Cloudera Manager on a Single Node

I am following the official guide of auto installation, while it seems easy at first glance. I had a rough time on it.

Before going on, CDH is considered kind of heavy distribution of hadoop. What I am doing is to use CDH as dev, and so far my experiences are (1) VMWorkstation on a 16G RAM windows host machine, with CDH taking up to 8G single node or (2) Mac 8G RAM host, with vagrant-enabled virtual box taking up to 4G and 2 vcores for a single node (more than one node will cause non-functional CDH)

Make sure if you are behind a proxy by enabling proxy as stated in the guide. In addition, once you are in the phase of downloading parcels, remember to configure parcel downloading proxy setting through web browser. Confirm on the downloading issue of parcel by checking ‘/var/log/cloudera-scm-server/cloudera-scm-server.log‘.

Don’t need to deal with java and just let it go with default oracle-j2sdk.

Disable IPv6 by following the blog guide.

Find ip address of the machine node and create FQDN as suggested in the comments. And Cloudera requires more and more:

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    A properly formatted /etc/hosts file should be similar to the following example:

    127.0.0.1	localhost.localdomain	localhost
    192.168.1.1	cluster-01.example.com	cluster-01
    192.168.1.2	cluster-02.example.com	cluster-02
    192.168.1.3	cluster-03.example.com	cluster-03

My intake is that 1) avoid using 127.0.0.1 from loopback interface, using IPs assigned through eth0 interface. 2) make hostname is FQDN as well, by using `sudo hostname <FQDN>` and saving the name in `/etc/hostname` for reboot.

Allow host to be resolvable from local /etc/hosts files by this:

cat /etc/NetworkManager/dnsmasq.d/hosts.conf 
addn-hosts=/etc/host

Disable firewall and iptable as in the guide.

CDH requires root access using password or private key. My take is using password for root user is easier. Do following,

Make sure openssh-server is installed and started

sudo apt-get install openssh-server

Give root a password, enable ssh root access

sudo passwd root;

vi /etc/ssh/sshd_config;

PermitRootLogin yes

service ssh restart

Test ssh access as root

ssh localhost

With above configuration, install cloudera manager and installation cluster should be working.

During cluster installation, if you need to retry from web browser, you may need to manually remove the lock by:

sudo rm /tmp/.scm_prepare_node.lock

If encountering any problem, you can always uninstall and get back to a clean state by following the uninstallation guide.

Note: during cluster installation, if the web browser does not show any progress bar, that means something wrong. Check the root access listed above.

======= after a running CDH, configurations to be continue ======
CDH calculates the settings (like memory location) for the host, but sometimes the configuration is not checked against the minimum requirement of installed components.

For example, the test installation with estimating PI does not work, unless increasing following memory settings in Yarn as (wired enough, the log in yarn does not point anything useful),

– Set the Container Memory (yarn.nodemanager.resource.memory-mb) to 4GB
– Set the Java Heap Size of ResourceManager to 1GB

The most useful way is to check the non-default settings by switching to the new view.

======= running mahout example ======
When trying to execute `mahout seq2sparse -i reuters-out-seqdir/ -o reuters-out-seqdir-lda -ow –maxDFPercent 85 –namedVector` with `MAHOUT_LOCAL` set to “something not null”, meaning running on local, guava library version mismatch.

Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

Mahout requires guava 16.0, while hadoop V2 uses guava 11.0.

The solution is quite wired. I was simply going to turn on the log by reading CDH 5’s mahout script, which is pointing mahout conf directory to /etc/mahout/conf.dist. In the conf directory, I put a simple log4j properties under it. Surprisingly, the guava problem is gone.