Senthil Rajendran's Blog: March 2013

Thursday, March 14, 2013

BIG DATA HADOOP Testing with MapReduce Examples Part 3

In BIG DATA HADOOP Testing with MapReduce Examples Part 1 and BIG DATA HADOOP Testing with MapReduce Examples Part 2 I have resolved some of the issues in getting the HADOOP running but still I have some issues left over and this time it is "Invalid shuffle port number -1 returned" when the mapreduce jobs are submitted.

hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output

13/03/13 14:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.

13/03/13 14:59:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.

13/03/13 14:59:40 INFO input.FileInputFormat: Total input paths to process : 1

13/03/13 14:59:41 INFO mapreduce.JobSubmitter: number of splits:1

13/03/13 14:59:41 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar

13/03/13 14:59:41 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

13/03/13 14:59:41 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

13/03/13 14:59:41 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

13/03/13 14:59:41 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name

13/03/13 14:59:41 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

13/03/13 14:59:41 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

13/03/13 14:59:41 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

13/03/13 14:59:41 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

13/03/13 14:59:41 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

13/03/13 14:59:41 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

13/03/13 14:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1363184126427_0001

13/03/13 14:59:41 INFO client.YarnClientImpl: Submitted application application_1363184126427_0001 to ResourceManager at /0.0.0.0:8032

13/03/13 14:59:42 INFO mapreduce.Job: The url to track the job: http://bigdataserver1:8088/proxy/application_1363184126427_0001/

13/03/13 14:59:42 INFO mapreduce.Job: Running job: job_1363184126427_0001

13/03/13 14:59:53 INFO mapreduce.Job: Job job_1363184126427_0001 running in uber mode : false

13/03/13 14:59:53 INFO mapreduce.Job: map 0% reduce 0%

13/03/13 14:59:54 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_0, Status : FAILED

Container launch failed for container_1363184126427_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_0

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:55 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_1, Status : FAILED

Container launch failed for container_1363184126427_0001_01_000003 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_1

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:57 INFO mapreduce.Job: Task Id : attempt_1363184126427_0001_m_000000_2, Status : FAILED

Container launch failed for container_1363184126427_0001_01_000004 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1363184126427_0001_m_000000_2

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:170)

at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:399)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

13/03/13 14:59:59 INFO mapreduce.Job: map 100% reduce 0%

13/03/13 14:59:59 INFO mapreduce.Job: Job job_1363184126427_0001 failed with state FAILED due to: Task failed task_1363184126427_0001_m_000000

Job failed as tasks failed. failedMaps:1 failedReduces:0

13/03/13 15:00:00 INFO mapreduce.Job: Counters: 4

Job Counters

Other local map tasks=3

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=0

Total time spent by all reduces in occupied slots (ms)=0

hadoop@bigdataserver1:~/hadoop>

Solution is to update yarn-site.xml with the values below and then restart the HADOOP cluster.

hadoop@bigdataserver1:~/hadoop> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar wordcount /bigdata1/name.txt /bigdata1/output4

13/03/13 15:23:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

13/03/13 15:23:15 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.

13/03/13 15:23:15 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.

13/03/13 15:23:16 INFO input.FileInputFormat: Total input paths to process : 1

13/03/13 15:23:16 INFO mapreduce.JobSubmitter: number of splits:1

13/03/13 15:23:16 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar

13/03/13 15:23:16 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

13/03/13 15:23:16 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

13/03/13 15:23:16 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

13/03/13 15:23:16 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name

13/03/13 15:23:16 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

13/03/13 15:23:16 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

13/03/13 15:23:16 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

13/03/13 15:23:16 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

13/03/13 15:23:16 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

13/03/13 15:23:16 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

13/03/13 15:23:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1363188167312_0001

13/03/13 15:23:17 INFO client.YarnClientImpl: Submitted application application_1363188167312_0001 to ResourceManager at /0.0.0.0:8032

13/03/13 15:23:17 INFO mapreduce.Job: The url to track the job: http://bigdataserver1:8088/proxy/application_1363188167312_0001/

13/03/13 15:23:17 INFO mapreduce.Job: Running job: job_1363188167312_0001

13/03/13 15:23:27 INFO mapreduce.Job: Job job_1363188167312_0001 running in uber mode : false

13/03/13 15:23:27 INFO mapreduce.Job: map 0% reduce 0%

13/03/13 15:23:32 INFO mapreduce.Job: map 100% reduce 0%

13/03/13 15:23:38 INFO mapreduce.Job: map 100% reduce 100%

13/03/13 15:23:38 INFO mapreduce.Job: Job job_1363188167312_0001 completed successfully

13/03/13 15:23:38 INFO mapreduce.Job: Counters: 43

File System Counters

FILE: Number of bytes read=2369

FILE: Number of bytes written=140677

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=1474

HDFS: Number of bytes written=1631

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=3851

Total time spent by all reduces in occupied slots (ms)=4999

Map-Reduce Framework

Map input records=200

Map output records=199

Map output bytes=2165

Map output materialized bytes=2369

Input split bytes=104

Combine input records=199

Combine output records=183

Reduce input groups=183

Reduce shuffle bytes=2369

Reduce input records=183

Reduce output records=183

Spilled Records=366

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=57

CPU time spent (ms)=2730

Physical memory (bytes) snapshot=358436864

Virtual memory (bytes) snapshot=926806016

Total committed heap usage (bytes)=303431680

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1370

File Output Format Counters

Bytes Written=1631

hadoop@bigdataserver1:~/hadoop>

Finally I have verified my HADOOP cluster.

BIG DATA Moving a file to HDFS

BIG DATA HADOOP Services Startup and Shutdown

BIG DATA is getting Bigger and Bigger
BIG DATA Getting Started with HADOOP
BIG DATA Cloudera and Oracle
BIG DATA CDH Single Node Setup
BIG DATA HADOOP Services Startup and Shutdown
BIG DATA Moving a file to HDFS
BIG DATA HADOOP Testing with MapReduce Examples Part 1
BIG DATA HADOOP Testing with MapReduce Examples Part 2
BIG DATA HADOOP Testing with MapReduce Examples Part 3

When using start-all.sh , I got the message "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh"

So it is clear that start-all.sh should not be user and the same hold true with stop-all.sh

Using start-dfs.sh it gives me JAVA_HOME is not set

hadoop@bigdataserver1:~/hadoop/sbin> sh start-dfs.sh
which: no start-dfs.sh in (/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/hadoop/hadoop/bin)
13/03/13 11:44:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
13/03/13 11:45:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop@bigdataserver1:~/hadoop/sbin> echo $JAVA_HOME
/home/hadoop/jdk1.6.0_18
hadoop@bigdataserver1:~/hadoop/sbin>

But you can see that the environment is sources properly.

Solution is to update hadoop-env.sh with the JAVA_HOME

hadoop@fravm097023:~/hadoop/etc/hadoop> grep -i JAVA_HOME hadoop-env.sh
# The only required environment variable is JAVA_HOME. All others are
# set JAVA_HOME in this file, so that it is correctly defined on
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/hadoop/jdk1.6.0_18
hadoop@fravm097023:~/hadoop/etc/hadoop>

After that DFS and YARN services will come up

hadoop@bigdataserver1:~/hadoop/sbin> sh start-dfs.sh

which: no start-dfs.sh in (/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/hadoop/hadoop/bin)

13/03/13 11:51:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-bigdataserver1.out

localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-bigdataserver1.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-bigdataserver1.out

13/03/13 11:51:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

hadoop@bigdataserver1:~/hadoop/sbin>

hadoop@bigdataserver1:~/hadoop/sbin> sh start-yarn.sh

starting yarn daemons

which: no start-yarn.sh in (/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/hadoop/hadoop/bin)

starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-bigdataserver1.out

localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-bigdataserver1.out

hadoop@bigdataserver1:~/hadoop/sbin>

BIG DATA CDH Single Node Setup

Formatting the hadoop nodename

hadoop@bigdataserver1:~> hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

13/03/13 11:15:08 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = bigdataserver1/10.216.9.25
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.0.0-cdh4.2.0
STARTUP_MSG: classpath = /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.5-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jline-0.9.94.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.6.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kfs-0.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-sources.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0-test-sources.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.0.0-cdh4.2.0.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/zookeeper-3.4.5-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.3.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jline-0.9.94.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.cloudera.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-sources.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.0-cdh4.2.0-test-sources.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-guice-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/netty-3.2.4.Final.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-site-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/netty-3.2.4.Final.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/avro-1.7.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.2.0-tests.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.2.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.0.0-cdh4.2.0.jar
STARTUP_MSG: build = file:///var/lib/jenkins/workspace/CDH4.2.0-Packaging-Hadoop/build/cdh4/hadoop/2.0.0-cdh4.2.0/source/hadoop-common-project/hadoop-common -r 8bce4bd28a464e0a92950c50ba01a9deb1d85686; compiled by 'jenkins' on Fri Feb 15 10:42:32 PST 2013
STARTUP_MSG: java = 1.6.0_18
************************************************************/
13/03/13 11:15:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-5d53a7be-005c-4d8b-9f93-088c795cbb35
13/03/13 11:15:10 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
13/03/13 11:15:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
13/03/13 11:15:10 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
13/03/13 11:15:10 INFO blockmanagement.BlockManager: defaultReplication = 1
13/03/13 11:15:10 INFO blockmanagement.BlockManager: maxReplication = 512
13/03/13 11:15:10 INFO blockmanagement.BlockManager: minReplication = 1
13/03/13 11:15:10 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
13/03/13 11:15:10 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
13/03/13 11:15:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
13/03/13 11:15:10 INFO blockmanagement.BlockManager: encryptDataTransfer = false
13/03/13 11:15:10 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
13/03/13 11:15:10 INFO namenode.FSNamesystem: supergroup = supergroup
13/03/13 11:15:10 INFO namenode.FSNamesystem: isPermissionEnabled = false
13/03/13 11:15:10 INFO namenode.FSNamesystem: HA Enabled: false
13/03/13 11:15:10 INFO namenode.FSNamesystem: Append Enabled: true
13/03/13 11:15:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/03/13 11:15:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
13/03/13 11:15:11 INFO namenode.NNStorage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
13/03/13 11:15:11 INFO namenode.FSImage: Saving image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/03/13 11:15:11 INFO namenode.FSImage: Image file of size 121 saved in 0 seconds.
13/03/13 11:15:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/03/13 11:15:11 INFO util.ExitUtil: Exiting with status 0
13/03/13 11:15:11 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdataserver1/10.216.9.25
************************************************************/
hadoop@bigdataserver1:~>

Start the service using start-all.sh

Health Check : you can verify the service using the below url
http://bigdataserver1.bigdata.com:50070/dfshealth.jsp

BIG DATA Cloudera and Oracle

BIG DATA HADOOP Testing with MapReduce Examples Part 3

Cloudera and Oracle partnered to bring a reliable mainstream enterprise solution of Big Data through the Oracle Big Data Appliance. It is designed and optimized for big work load and built using standard hardware.

Oracle Big Data Appliance is a supported big data infrastructure which is affordable , salable and integrated with the key components of big data platform.

CDH (Cloudera Distribution Including Apache Hadoop) is for enterprise deployment which is included with the Oracle Big Data Appliance. It has a free version and it supports up to 50 node but it can scale up with the paid version.

BIG DATA Getting Started with HADOOP

BIG DATA HADOOP Testing with MapReduce Examples Part 3

Hadoop was founded by Douglass Read Cutting and picked the name from one of his son's toy elephant.

It is a open source project from Apache that evolved rapidly into a major technology movement. It is capable to handle large data set which are structured and unstructured. It had ability to run on low cost clusters and can scale up rapidly.

Hadoop architecture helps applications run on nodes which has thousands of terabytes of data. It has a distributed file system called HDFS (Hadoop Distributed File System) with fast data transfer rates between the clustered nodes and also support failures. It does not require RAID Storage as it achieves reliability by replication technology on multiple hosts.

Hadoop is a collection of components like MapReduce , Hive , Pig , NoSQL Database , ZooKeeper , Ambari , HCatalog , Oozie, Hue and more.

MapReduce is one key component. It is a framework for writing applications that processes large amount of data which are structured and unstructured in nature. MapReduce applications can be designed for a single node Hadoop and then deployed on a 100 node Hadoop.

MapReduce engine had one JobTracker to which jobs are submitted. The JobTracker then pushes work to the TaskTracker nodes in the cluster. JobTracker and TaskTracker help to complete the allocated job.

I will share my experience in getting a single node hadoop running and then running a mapreduce sample application in details.

BIG DATA is getting Bigger and Bigger

With the cost of storage & CPU going down and with the big vendors pitching in with reliable solutions it is now time for companies to look for crunching all those data's which are large and complex.

So how Oracle is helping Companies to eat the Big Mac ?

Oracle Big Data Appliance with clusters of servers.

Cloudera distribution including Apache Hadoop to acquire and organize data.

Oracle NoSQL Database Community Edition to acquire data.

Additional supporting system software like Oracle Linux , Oracle JVM VM and R distribution.

So what is that had to do with Administrator like me ?

No more terabytes , I have to think big like petabyte , exabyte , zettabyte , yottabyte and more

Hadoop , future of the big data

No more SQL , I have to know NoSQL (Not only SQL)

Big data connectors to RDBMS integrating with Oracle Database and Exadata

For analyze and visualizing with Oracle Exalytics.

Cloud computing and Big Data

So what are the likely responsibilities for Big Data Administrators ?

Managing fast growing big data clusters , NoSQL database and linux systems.

Meeting operational performance and availability

Setup monitoring and health check of the clusters and all its associated components.

Define and enforce operational procedures , processes and best practices.

Provide leadership and take ownership in collaboration with teams/vendors/3rd parties to resolving issues.

Effectively influence , partner and deliver results in a team environment.

With all the above said I will dedicate some time to snack the Big Mac.

Relevant Posts

BIG DATA Getting Started with HADOOP
BIG DATA Cloudera and Oracle
BIG DATA CDH Single Node Setup
BIG DATA HADOOP Testing with MapReduce Examples Part 1
BIG DATA HADOOP Testing with MapReduce Examples Part 2
BIG DATA HADOOP Testing with MapReduce Examples Part 3

Tuesday, March 5, 2013

CeBIT CeBIT CeBIT CeBIT CeBIT

CeBIT is the world's leading high-tech event showcasing digital IT and telecommunications solutions, from 5 to 9 March, 2013 in Hanover, Germany.

This year I had the opportunity to be there.

It was a long day and lots and lots of walk.

Here are some of the important information I found in CeBIT

German Chancellor Angela Merkel officially opened the world’s biggest annual high-tech trade fair.
SAP's presence was very good with all of its partners.
Could see SAP HANA's presence more and had a hand's on Smart Energy demo of a 512GB HANA Appliance.
Every body looks like in-memory computing
Could see Chinese dominance over phones, tablets and other electronics.
Android is every where
Saw the 3D printer and it is really interesting (spend 2 hours to see a model getting printed)
Lots of space for Gaming

Planned to visit CeBIT another day in this week and will share some interesting stuffs.

Monday, March 4, 2013

Enterprise Manager Grid Control 12c Upgrade

I am in the process of upgrading to 12c Cloud Control from version 11.1.0.1.0 using the one system upgrade.

All looks good with the prerequisite checks but having the problem when authenticating with the database.
The installer gives me ORA-00942 : table or view does not exist but I have no idea.

I am seeing the below message in the installer log files

java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:851)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1477)
at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:392)
at oracle.sysman.install.oneclick.queries.DBConnect.executeQuery(DBConnect.java:118)
at oracle.sysman.install.oneclick.queries.executeSql.performQuery(executeSql.java:56)
at oracle.sysman.install.oneclick.EMGCInputValidation.validate(EMGCInputValidation.java:2370)
at oracle.sysman.install.oneclick.EMGCDatabaseConnectionDlgonNext.actionsOnClickofNext(EMGCDatabaseConnectionDlgonNext.java:395)
at oracle.sysman.install.oneclick.EMGCNoSeedInstallDetailsDlg_new$PageValidationListener.wizardValidatePage(EMGCNoSeedInstallDetailsDlg_new.java:667)
at oracle.bali.ewt.wizard.WizardPage.processWizardValidateEvent(WizardPage.java:710)
at oracle.bali.ewt.wizard.WizardPage.validatePage(WizardPage.java:680)
at oracle.bali.ewt.wizard.BaseWizard.validateSelectedPage(BaseWizard.java:2367)
at oracle.bali.ewt.wizard.BaseWizard._validatePage(BaseWizard.java:3072)
at oracle.bali.ewt.wizard.BaseWizard.doNext(BaseWizard.java:2152)
at oracle.bali.ewt.wizard.dWizard.DWizard.doNext(DWizard.java:405)
at oracle.bali.ewt.wizard.BaseWizard$Action$1.run(BaseWizard.java:3944)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:209)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:642)
at java.awt.EventQueue.access$000(EventQueue.java:85)
at java.awt.EventQueue$1.run(EventQueue.java:603)
at java.awt.EventQueue$1.run(EventQueue.java:601)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:612)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

So far no clue on the errors. Will keep updated if this can help someone.

Thursday, March 14, 2013

Tuesday, March 5, 2013

Monday, March 4, 2013

Popular Posts