Welcome back to another “In A Jiffy” blog post, where in these series we learn something quick and at a beginner / intro level. Since the HDInsight GA announced on 28 Oct 2013, the User Interface has been revamped, features have been modified, and documentation has been slowly updated to reflect the changes. One of your questions might be where to execute Hadoop Commands.

Where Can I Execute Hadoop Commands?

One of the ways to manage Windows Azure Blob Storage for HDInsight is by using the Hadoop command as mentioned in this documentation Use Windows Azure Blob storage with HDInsight. In order to do this, you’ll need to enable the remote connection to your HDInsight cluster and connect to it (just like you connect to a server remotely). Once you connect to it, luckily there is a desktop shortcut called “Hadoop Command Line” to make life easier to execute the Hadoop commands – e.g.

hadoop fs -ls /output/result.txt

For my version of the HDInsight cluster, the Hadoop distribution file is located on this directory:

C:appsdisthadoop-1.2.0.1.3.1.0-06

So if the desktop shortcut is not available, you can launch Command Prompt on the server via remote connection and go to the path similar to the above, then start using the Hadoop commands.

That’s it for the “In A Jiffy” part.

Want more? Read on…

 

Configuring Remote Connection to HDInsight Cluster

Below is a set of instruction that elaborates how to setup remote connection to your HDInsight Cluster and where to execute Hadoop command lines.

Prerequisites

1. HDInsight Cluster created (check Your First HDInsight Cluster–Step by Step if you have not create an HDInsight Cluster yet)

2. Access to Windows Azure Management Portal

The Steps

1. Login to your Windows Azure Management Portal and go to the HDInsight cluster that you want to execute the Hadoop command against.

2. Go to the Configuration option of the HDInsight cluster and click on the “Enable Remote” button on the bottom of the screen.

HDInsight Cluster Configuration : Enable Remote

3. A “Configure Remote Desktop” window will be launched where you can create a new user that can login via Remote Desktop.

HDInsight: Configure Remote Desktop

Once you enter the details, you’ll see that the “Connect” and “Disable Remote” buttons are disabled while the Remote Desktop access being configured is created in the background.

HDInsight: Enabling remote desktop

The background configuration usually takes a couple of minutes (or less). Once it is done, click on the “Connect” button and it will start downloading an rdp file to connect to the HDInsight Cluster.

HDInsight:. Connect HDInsight:. Opening RDP file

4. When prompted, enter the credential to connect remotely.

HDInsight: Log in via RDP

Once connected, you will see “Hadoop Command Line” on the desktop – and voila you can make use your Hadoop skills here.

HDInsight: Hadoop Command line shortcut HDInsight: Hadoop command line

 

Wrap Up

Hadoop Commands can be executed on HDInsight Cluster via Remote Connection. You’ll first need to enable the remote connection.

 

Further Reading

Use Windows Azure Blob storage with HDInsight by Windows Azure

Your First HDInsight Cluster–Step by Step by Cindy Gross and Murshed Zaman

Upload data to Blob Storage using Hadoop Command Line by Windows Azure

 

 

Categories:

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *

Some simple Math is good for your brain! Thanks, Ms SQL Girl. * Time limit is exhausted. Please reload the CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.