Author Archives: reachumashankar

About reachumashankar

With passion for enabling Operational excellence and Cost optimization through cutting edge technology. Rapid Cloud enabling and Bigdata are the current Technology focus Area. Enterprise & ISVs are the business focus segments.

Extending AWS NAT instance utilization – Part 1

For those of us who have been through setting up VPC(Virtual Private Cloud) in AWS know the need to have a NAT instance running. Since we have to keep this instance running as long we need the easy internet access from instances running in Private subnets, had always wondered how else can this instance be utilized. In reality these NAT instance are doing a outward Port Address Translation from internal Private Subnet instances to external Internet.

Typical NAT Server in AWS:



While, there are those who may argue that the NAT instance better be left alone, we believe there are ways to extend the abilities without noticeable impact of the primary utility. Typically, NAT instance that is provisioned using AWS wizard allows you to choose between small / medium instance types. After providing for the need for the Internet access for your private subnet instances (read in most typical use cases), there is enough time cycles which we can take advantage off.

Note: In addition to above, as we know NAT instance will have an Elastic IP assigned (VPC type), which is a scarce resource (limited to 5 by default, and you need to request for more to Amazon). This gives one more reason to utilize an instance with EIP(Elastic IP) to best use.

Let us see a few of them which we were able to successfully harness, to our clients and for our needs here at CloudKinetics.

Extension-1: PAT (Inward Tunnel): enable access to specific port – instance in Private subnet from Internet

At first it seems appalling to have a direct access to an instance running in Private subnet from internet (as the very purpose of Private subnet is to avoid direct access from internet), it is not uncommon to this scenario, when you have VPC but no VPN configured and say a DB Server running in the private subnet. 

In the above scenario, if we need to access the DB Server(in Private Subnet with in VPC) from a Toad or other SQL clients from the enterprise / home network to quickly do a sql query, then we need a direct access. Alternative would be to run such clients from with in another instance with in the VPC. Nevertheless it is always convenient to run those from our own laptop / desktop from work. Though we want to enable to this access, but need to ensure it is restricted to only this instance and to a specific port and from a specific external IP(or range of IPs).

Let us assume we have an Oracle on RHEL running in our VPC in the Private Subnet, which is primarily accessed by other Application instance inside VPC.  If you have need to run some porting scripts or intializing DDL scripts from the client, need to have access to Oracle running in Private subnet (say 10.0.1.*) open port 1532 @ from Enterprise public IP (

It is a three-step process,

a) Getting the IPTABLES update command ready

b) Updating the IPTABLES service in NAT configuration to ensure retaining configuration after restart.

c) Open the required port in the NAT Server SecurityGroup

By Default the AWS NAT ami implements the Outward PAT ability through IPTABLES service. The configurations to the service are updated on startup from /usr/local/sbin/

Edit this file as root and find the line as below

/sbin/iptables -t nat -A POSTROUTING -o eth0 -s ${VPC_CIDR_RANGE} -j MASQUERADE

Update to look as below to include the additional two lines for allowing the access from external internet to Db Server ( 1532).

 /sbin/iptables -t nat -A POSTROUTING -o eth0 -s ${VPC_CIDR_RANGE} -j MASQUERADE && \
 /sbin/iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 51532 -j DNAT --to && \
 /sbin/iptables -A FORWARD -p tcp -d --dport 1532 -j ACCEPT

Please take care to add the “&& \” to the existing MASQUERADE command line.


Now, time to reboot the server. With the above step we have configured the port 51532 in the NAT Server to be open to Internet, which in turn will do a Port Address Translation to send it to port 1532 of the local VPC Private subnet instance

To restrict and open the access from our Enterprise IP Address to the NAT Server on port 51352, need to configure the NATServerGroup (or the Security Group assigned to NAT Server) as below.


That is it, we have now enabled the NATServer to do an inward Port Address Translation. We can extend this to other instances / ports as required.  Also the above steps can be automated by using simple shell scripts.



IP Tables:

In the next parts we will cover other possible extensions, as listed below:

Part 2 on Extension-2: NFS (Network File Storage) – Shared drive for VPC instances (and Enterprise)

Part 3 on Extension-3: SFTP Server for File upload

Hadoop Cluster using Whirr – BYON in AWS VPC – Part 2

In this article series we will look at the steps in creating a Whirr base instance which will be used to launch the Hadoop cluster over your custom created instances  ”using AWS VPC – BYON(Bring Your Own Network)”, ie., on pre-setup machine instances in AWS VPC.

In this article we will focus on how to install whirr in the Whirr Based & how to launch the hadoop cluster 

As discussed in the Part 1, we should have a whirr base with the whirr tar deflated in the Ubuntu home directory.

Step1: Next step is the log on to get whirr in path.

You can use other ways of adding to path as well, here we will create “Soft link” to the whirr file into /usr/bin

 $ sudo  ln –s /home/ubuntu/whirr-0.8.1/bin/whirr /usr/bin/whirr

Now enter

$ whirr version


This should return the Apache whirr & jclouds version.

Step2: Create a ssh key for hadoop cluster

Create a ssh key pair by using the below command. This should generate id_rsa and in the .ssh folder under the current user home (/home/Ubuntu/.ssh)

$ ssh-keygen -t rsa -P ''


Step3: Next step is to launch the Amazon EC2 VPC instance for hadoop cluster

Recollect in the earlier article(Part 1) we had  defined 3 dns names and ip address for our hadoop clusters. We will launch a hadoop cluster with 1 master(jobtracker + namenode) with 2 workers (tasktracker + datanode).

So we need to launch 3 instances with the desired machine size into the subnet of the VPC where the ‘whirr base’ instance is running. We will use the same template of Ubuntu 12.04.1 available as standard with Amazon.

Note: Use the keypair generated in the above step as the keypair for launching these instances. Meaning import this key into your aws account before you start launching your instances





Step4: Configure each node with the DNS name and DNS server

Access each of the instance via ssh with the ssh keys and make the following changes each one of them.

$ sudo vi /etc/dhcp/dhcpclient.conf

Make the changes to update the domain-name as ‘ck.local’ and the domain-name-servers (nameserver ip address)


Now update the hostname to the corresponding hostname configured in the dns server for each of the ipaddress. Note: DNS names used should be exactly the same as used against the corresponding ip address of the instance as configured in the DNS Server. (refer Part1)

$ sudo vi /etc/hostname

Delete the current name entry and update as node1 for instance with ip address, one with as node2 and that with as node3. Update save the file and reboot.

$ sudo reboot

Now we are ready for launch of hadoop cluster.

Step5: Configure Whirr for BYON & CDH and launch hadoop cluster

Now that we have all the instances ready for hadoop deployment, it is time to configure the whirr for BYON(bring your own network). Our network as we recollect is as follows

3 instances with Ubuntu 12.04, 64 bit, default sudo user name: ‘ubuntu’

Copy two configuration files from existing whirr installation under ‘recipes’ folder

$ cp whirr-0.8.1/recipes/ ~/

$ cp whirr-0.8.1/recipes/nodes-byon.yaml ~/cdh-byon.yaml

Now edit the as below, change the following


# Change the name of cluster admin user

# Change the number of machines in the cluster here
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker




Now edit and update the cdh-byon.yaml file for our network.

    - id: ubuntu1
      os_arch: x86_64
      os_family: ubuntu
      os_description: ubuntu
      os_version: 12.04
      group: ubuntu
      username: ubuntu
      credential_url: file:///home/ubuntu/.ssh/id_rsa
    - id: ubuntu2
      os_arch: x86_64
      os_family: ubuntu
      os_description: ubuntu
      os_version: 12.04
      group: ubuntu
      username: ubuntu
      credential: file:///home/ubuntu/.ssh/id_rsa
    - id: ubuntu3
      os_arch: x86_64
      os_family: ubuntu
      os_description: ubuntu
      os_version: 12.04
      group: ubuntu
      username: ubuntu
      credential: file:///home/ubuntu/.ssh/id_rsa


That is it we are now ready for launching our cluster.

Step6: Launch CDH hadoop cluster through whirr

Execute the below command from /home/ubuntu as ubuntu user.

$ whirr launch-cluster --config

This will take time and finally you should get the confirmation message as below with the URL to access the Namenode status and JobTracker status as below.


With that we have the CDH hadoop cluster launched via Whirr using BYON into Amazon AWS VPC.

Job Tracker


Name Node




Note: Now you can stop, start instances as need be for any development / testing needs of hadoop cluster.


Part 1:

Hadoop Cluster using Whirr – BYON in AWS VPC – Part 1

In this article series we will look at the steps in creating a Whirr base instance which will be used to launch the Hadoop cluster over your custom created instances  “using AWS VPC – BYON(Bring Your Own Network)”, ie., on pre-setup machine instances in AWS VPC. In this article we will focus on how to create a Whirr base instance. 

Key challenge in getting a hadoop cluster through Whirr – BYON over Amazon AWS VPC, is that each instance in the Hadoop clusters should have a hostname which is traceable both by forward and reverse dns lookup in their network. Any AWS VPC instance is not assigned with any dns names(only ip address) and is not associated with a local dns.  So we will have to get a local dns server setup in the Whirr base instance and then setup Whirr with Open JDK. In the local dns we will configure the dns names and ip address of instances which we will use for hadoop cluster launch.

Creating Whirr base template with ability to launch BYON.

To start with launch default 12.04.1 ubuntu instance (small)

$ sudo su

Step1: Install DNS Server: bind9

$ apt-get install bind9
$ cd /etc/bind9

Step2: Forward dns lookup setup: db.<dns domain>

Make decision on your local dns name say, “ck.local”. Current machine will be the SOA & NameServer(NS) for the domain. Calling current machine (hostname) as “dc”, meaning

Make a copy of the db.local file as

$ cp db.local

Edit and update as below:

With SOA as & administrator as (instead of root@ck.local)

I wanted to configure 3 more machine in dns other than the current ( in my case)

Make the others as,,


Step3: Reverse Lookup db.<ip address range in reverse>

Now we need to create reverse lookup, copy the db.0 and name it as db.1.0.10 (where all my machines are going to be in the ip ranges of 10.0.1.* and update as below.


Step 4: Include in Name configuration

Next step is to update the bind name configuration to include all these files.

$ vi named.conf.default-zones

update as below, to include both the (instead of the db.local) & update a new entry for db.1.0.10


Step5: Update External DNS forwarder

Now the next update, is to configure the forwarder for the external dns names to the (default dns server for VPC

$ vi named.conf.options


Step6: Update the current machine hostname & nameserver configurations.

In this Ubuntu it is indirectly controlled by dhcp configuration and editing /etc/resolv.conf is the not the right idea.

$ vi /etc/dhcp/dhclient.conf


Update the set host-name to hostname and then uncomment the “supersede domain-name” and update to the chosen domain name, then uncomment the prepend domain-name-servers and update it with the current machine as the dns server “” (I have updated the vpc dns server[] which is not really required)

Now make sure the bind9 is part of startup service. I did that using

$ chkconfig bind9 on

If you don’t have the chkconfig install first using apt-get install chkconfig.

$ reboot

Now you have a DNS Server ready.

Step7: Next Step is to get the OpenJDK, Whirr downloaded as below.


$ wget

$ tar –xzf whirr-0.8.1.tar.gz


Follow the instructions as in

That is it you are now ready with the Whirr base. Create a Amazon Machine Image(AMI) out of this and should be useful to get a whirr base instance on demand. Next document will give inputs on how to launch a Hadoop cluster via Whirr – BYON service provider.