Prakash Reddy Vaka


Setting up java on ubuntu

  1. 1.Download the required tarball from here

  2. 2.unzip this tarball using "tar -zxvf tarball_name

  3. 3.create a folder name java in /usr/lib, you need root permission

  4. the extracted folder to /usr/lib/java/

  5. run these below scripts in terminal
    sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_65/bin/java" 1
    sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_65/bin/javac" 1
    sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_65/bin/javaws" 1

  6. 6.update the JAVA_HOME in your ~/.bashrc
    export JAVA_HOME=/usr/lib/java/jdk1.7.0_65
    set PATH="$PATH:$JAVA_HOME/bin"
    export PATH
    Your bashrc should look like bashrc

Installing hadoop on sudo distribution mode ubuntu

  1. 1.Install Jdk 1.6 or greater here

  2. 2.Download the required hadoop version from here

  3. 3.Extract the folder and navigate to conf folder inside it

  4. 4.update the JAVA_HOME inside the file

  5. 5.update your ~/.bashrc with HADOOP_HOME, your bashrc here

  6. 6.modify your core-site.xml hdfs-site.xml and mapred-site.xml
    right click and save these links to view their content

  7. 7.install ssh on your system using sudo apt-get install ssh

  8. 8.ssh localhost should log you in

  9. the below two commands to save the auth keys
    $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    $ cat ~/.ssh/ >> ~/.ssh/authorized_keys

  10. your system is setup and installed with hadoop, format your namenode(only the first time)
    hadoop namenode -format

  11. to run your namenode,datanode,secondarynamenode,jobtracker and tasktracker

  12. 12.You can view the namenode,tasktracker at http://localhost:50030, http://localhost:50070, http://localhost:50060

  13. can interact with hdfs using hadoop fs -ls /

Installing storm

  1. 1.Install JDK 1.6 or greater follow tutorial 1

  2. 2.Install python 2.6.6 or greater from here

  3. 3.Download zookeeper tarball from here

  4. 4.Download apache storm tarball from here

  5. 5.Extract both the tarballs and move them to /usr/lib

  6. ~/Downloads/zookeeper-3.4.6 /usr/lib/

  7. ~/Downloads/storm-0.9.2 /usr/lib/

  8. 8.Your should have something similar to /usr/lib/zookeeper-3.4.6 and /usr/lib/storm-0.9.2

  9. 9.update your ~/.bashrc with zookeeper and storm locations, here

  10. 10.create a zoo.cfg file under /usr/lib/zookeeper-3.4.6/conf like this

  11. 11.zoo.cfg file is referred to save the temporary data for zookeeper, storm is reliable because of zookeper.

  12. 12.Even if storm server(nimbus) goes down all the data is stored in zookeeper.

  13. 13.the dataDir settings in zoo.cfg is different for your machine(change it accordingly)

  14. 14.Done!! you have installed storm and zookeeper, lets run them and check

  15. 15.start zookeeper using start, you can check using and ls /

  16. the following scripts to start storm
    storm nimbus
    storm supervisor
    storm ui

  17. 17.You can see the storm UI at http://localhost:8080

  18. 18.Cheers

Using storm : Basic storm example

  1. 1.Make sure your storm ui is started and running, http://localhost:8080

  2. 2.Clone the word count problem from this location here

  3. 3.git clone git:// && cd incubator-storm/examples/storm-starter

  4. 4.Install maven on your machine, you can download it from here
    On Ubuntu you can install using sudo apt-get install maven

  5. 5.Compile the projects and package using below commands

    1. mvn compile

    2. mvn package

  6. 6.Your package is available in target folder(a fat jar)

  7. 7.You can run the topology in two ways

    1. Localmode : Runs only once. Topology not deployed to server. storm jar storm-starter-*-jar-with-dependencies.jar storm.starter.RollingTopWords

    2. Remotemode : Deployed to storm as topology, and can be managed to UI. storm jar storm-starter-*-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote

  8. 8.Now you can see the started topology production-topology in your storm UI

  9. 9.You can find more details here

Installing IBM BigInsights on Linux

  1. 1.Will post soon, you can find the reference here

Installing Kafka

  1. 1.Download apache kafka from here

  2. 2.unzip the file using tar -zxvf kafka-

  3. 3.Now start the zookeeper in your machine, if not start the zookeeper in the installer
    bin/ config/

  4. 4.Now start the kafka
    bin/ config/

  5. 5.Create a topic to produce and consume
    bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

  6. 6.Create a producer for the topic
    bin/ --broker-list localhost:9092 --topic test

  7. 7.Type a message in the producer window

  8. 8.Create a consumer for the topic
    bin/ --zookeeper localhost:2181 --topic test --from-beginning

  9. 9.You can see the messages in your consumer window!!!
    Installation complete !!!!

  10. 10.Reference here

Run a map reduce jobs from eclipse : Windows/Linux/MacOSX

  1. 1.Create a simple java project or maven project with all the hadoop dependencies

  2. 2.Here is the pom.xml file for reference.

  3. 3.Import the required jars, reference

  4. 4.Pass the input and output folders as arguments. Create the input folder in your project, output will be created by the job.

  5. 5.Run the program to see the map reduce run in standalone mode and the output saved to folder in your workspace

  6. 6.Here is the source code.

Using eclipse to run storm starter

  1. 1.Using Maven

    1. 1.Install maven plugin in your eclipse. Instructions are here

    2. git clone git:// to get the storm starter code into your local.

    3. 3.Import the maven project in example folder into your eclipse, and compile.

    4. 4.Once compiled, right click the topology class you want to start and run as java application.

  2. 2.Using Manual imports

    1. 1.Make sure you have storm installed in your machine.(Download required jars)

    2. 2.Create a new java project in eclipse.

    3. 3.Import all the jars from $STORM_HOME/lib as external jars into the project.

    4. 4.Create a topology and run it in local mode.

    5. 5.Storm starter has sample toplogies, which you can try.

    6. 6.With you can stream tweets into bolts. You need twitter4.jar from here

Installing Cloudera on Ubuntu 12.04 Server

    1.     change hosts file

    2.     install ssh

    3.     save public keys as 'authorized_keys'

    4.     download cloudera repo from "here"

    5.     install this file using "sudo dpkg -i cdh5-repository_1.0_all.deb"

    6.     download cloudera install manager from ""

    7.     change permission on file "chmod +x cloudera-manager-installer.bin"

    8.     start the installer "sudo ./cloudera-manager-installer.bin"

    9.     follow steps and your install is complete.

    10.   For reference watch the video here