Welcome back to another “In A Jiffy” blog post, where in these series we learn something quick and at a beginner / intro level. Since the HDInsight GA announced on 28 Oct 2013, the User Interface has been revamped, features have been modified, and documentation has been slowly updated to reflect the changes. One of your questions might be where to execute Hadoop Commands.
Where Can I Execute Hadoop Commands?
One of the ways to manage Windows Azure Blob Storage for HDInsight is by using the Hadoop command as mentioned in this documentation Use Windows Azure Blob storage with HDInsight. In order to do this, you’ll need to enable the remote connection to your HDInsight cluster and connect to it (just like you connect to a server remotely). Once you connect to it, luckily there is a desktop shortcut called “Hadoop Command Line” to make life easier to execute the Hadoop commands – e.g.
hadoop fs -ls /output/result.txt
For my version of the HDInsight cluster, the Hadoop distribution file is located on this directory:
So if the desktop shortcut is not available, you can launch Command Prompt on the server via remote connection and go to the path similar to the above, then start using the Hadoop commands.
That’s it for the “In A Jiffy” part.
Want more? Read on…
Configuring Remote Connection to HDInsight Cluster
Below is a set of instruction that elaborates how to setup remote connection to your HDInsight Cluster and where to execute Hadoop command lines.
1. HDInsight Cluster created (check Your First HDInsight Cluster–Step by Step if you have not create an HDInsight Cluster yet)
2. Access to Windows Azure Management Portal
1. Login to your Windows Azure Management Portal and go to the HDInsight cluster that you want to execute the Hadoop command against.
2. Go to the Configuration option of the HDInsight cluster and click on the “Enable Remote” button on the bottom of the screen.
3. A “Configure Remote Desktop” window will be launched where you can create a new user that can login via Remote Desktop.
Once you enter the details, you’ll see that the “Connect” and “Disable Remote” buttons are disabled while the Remote Desktop access being configured is created in the background.
The background configuration usually takes a couple of minutes (or less). Once it is done, click on the “Connect” button and it will start downloading an rdp file to connect to the HDInsight Cluster.
4. When prompted, enter the credential to connect remotely.
Once connected, you will see “Hadoop Command Line” on the desktop – and voila you can make use your Hadoop skills here.
Hadoop Commands can be executed on HDInsight Cluster via Remote Connection. You’ll first need to enable the remote connection.
Use Windows Azure Blob storage with HDInsight by Windows Azure
Your First HDInsight Cluster–Step by Step by Cindy Gross and Murshed Zaman
Upload data to Blob Storage using Hadoop Command Line by Windows Azure