Prakash Reddy Vaka
pv6xc@umkc.edu
Prakash Reddy Vaka
pv6xc@umkc.edu
1.Download the required tarball from here
2.unzip this tarball using "tar -zxvf tarball_name
3.create a folder name java in /usr/lib, you need root permission
4.mv the extracted folder to /usr/lib/java/
5.next run these below scripts in terminal
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_65/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_65/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_65/bin/javaws" 1
6.update the JAVA_HOME in your ~/.bashrc
export JAVA_HOME=/usr/lib/java/jdk1.7.0_65
set PATH="$PATH:$JAVA_HOME/bin"
export PATH
Your bashrc should look like bashrc
Installing hadoop on sudo distribution mode ubuntu
1.Install Jdk 1.6 or greater here
2.Download the required hadoop version from here
3.Extract the folder and navigate to conf folder inside it
4.update the JAVA_HOME inside the hadoop-env.sh file
5.update your ~/.bashrc with HADOOP_HOME, your bashrc here
6.modify your core-site.xml hdfs-site.xml and mapred-site.xml
right click and save these links to view their content
7.install ssh on your system using sudo apt-get install ssh
8.ssh localhost should log you in
9.run the below two commands to save the auth keys
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
10.now your system is setup and installed with hadoop, format your namenode(only the first time)
hadoop namenode -format
11.run start-all.sh to run your namenode,datanode,secondarynamenode,jobtracker and tasktracker
12.You can view the namenode,tasktracker at http://localhost:50030, http://localhost:50070, http://localhost:50060
13.you can interact with hdfs using hadoop fs -ls /
1.Install JDK 1.6 or greater follow tutorial 1
2.Install python 2.6.6 or greater from here
3.Download zookeeper tarball from here
4.Download apache storm tarball from here
5.Extract both the tarballs and move them to /usr/lib
6.mv ~/Downloads/zookeeper-3.4.6 /usr/lib/
7.mv ~/Downloads/storm-0.9.2 /usr/lib/
8.Your should have something similar to /usr/lib/zookeeper-3.4.6 and /usr/lib/storm-0.9.2
9.update your ~/.bashrc with zookeeper and storm locations, here
10.create a zoo.cfg file under /usr/lib/zookeeper-3.4.6/conf like this
11.zoo.cfg file is referred to save the temporary data for zookeeper, storm is reliable because of zookeper.
12.Even if storm server(nimbus) goes down all the data is stored in zookeeper.
13.the dataDir settings in zoo.cfg is different for your machine(change it accordingly)
14.Done!! you have installed storm and zookeeper, lets run them and check
15.start zookeeper using zkServer.sh start, you can check using zkCli.sh and ls /
16.run the following scripts to start storm
storm nimbus
storm supervisor
storm ui
17.You can see the storm UI at http://localhost:8080
18.Cheers
Using storm : Basic storm example
1.Make sure your storm ui is started and running, http://localhost:8080
2.Clone the word count problem from this location here
3.git clone git://github.com/apache/incubator-storm.git && cd incubator-storm/examples/storm-starter
4.Install maven on your machine, you can download it from here
On Ubuntu you can install using sudo apt-get install maven
5.Compile the projects and package using below commands
◦mvn compile
◦mvn package
6.Your package is available in target folder(a fat jar)
7.You can run the topology in two ways
◦Localmode : Runs only once. Topology not deployed to server. storm jar storm-starter-*-jar-with-dependencies.jar storm.starter.RollingTopWords
◦
◦Remotemode : Deployed to storm as topology, and can be managed to UI. storm jar storm-starter-*-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote
8.Now you can see the started topology production-topology in your storm UI
9.You can find more details here
Installing IBM BigInsights on Linux
1.Will post soon, you can find the reference here
1.Download apache kafka from here
2.unzip the file using tar -zxvf kafka-
3.Now start the zookeeper in your machine, if not start the zookeeper in the installer
bin/zookeeper-server-start.sh config/zookeeper.properties
4.Now start the kafka
bin/kafka-server-start.sh config/server.properties
5.Create a topic to produce and consume
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
6.Create a producer for the topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
7.Type a message in the producer window
8.Create a consumer for the topic
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
9.You can see the messages in your consumer window!!!
Installation complete !!!!
10.Reference here
Run a map reduce jobs from eclipse : Windows/Linux/MacOSX
1.Create a simple java project or maven project with all the hadoop dependencies
2.Here is the pom.xml file for reference.
3.Import the required jars, reference
4.Pass the input and output folders as arguments. Create the input folder in your project, output will be created by the job.
5.Run the program to see the map reduce run in standalone mode and the output saved to folder in your workspace
6.Here is the source code.
Using eclipse to run storm starter
1.Using Maven
1.Install maven plugin in your eclipse. Instructions are here
2.run git clone git://github.com/apache/incubator-storm.git to get the storm starter code into your local.
3.Import the maven project in example folder into your eclipse, and compile.
4.Once compiled, right click the topology class you want to start and run as java application.
2.Using Manual imports
1.Make sure you have storm installed in your machine.(Download required jars)
2.Create a new java project in eclipse.
3.Import all the jars from $STORM_HOME/lib as external jars into the project.
4.Create a topology and run it in local mode.
5.Storm starter has sample toplogies, which you can try.
6.With PrintSampleStream.java you can stream tweets into bolts. You need twitter4.jar from here
Installing Cloudera on Ubuntu 12.04 Server
1. change hosts file
2. install ssh
3. save public keys as 'authorized_keys'
4. download cloudera repo from "here"
5. install this file using "sudo dpkg -i cdh5-repository_1.0_all.deb"
6. download cloudera install manager from "herelivepage.apple.com"
7. change permission on file "chmod +x cloudera-manager-installer.bin"
8. start the installer "sudo ./cloudera-manager-installer.bin"
9. follow steps and your install is complete.
10. For reference watch the video here