Friday, April 18, 2014

Big Data Oracle NoSQL in No Time - It is time to Load Data for a Simple Use Case

Big Data Oracle NoSQL in No Time - It is time to Load Data for a simple Use Case

Index
Big Data Oracle NoSQL in No Time - Getting Started Part 1
Big Data Oracle NoSQL in No Time - Startup & Shutdown Part 2

Big Data Oracle NoSQL in No Time - Setting up 1x1 Topology Part 3
Big Data Oracle NoSQL in No Time - Expanding 1x1 to 3x1 Topology Part 4
Big Data Oracle NoSQL in No Time - From 3x1 to 3x3 Topology Part 5
Big Data Oracle NoSQL in No Time - Smoke Testing Part 6
Big Data Oracle NoSQL in No Time - Increasing Throughput Read/Write Part 7
Big Data Oracle NoSQL in No Time - It is time to Upgrade
Big Data Oracle NoSQL in No Time - It is time to Load Data for a Simple Use Case

There are a lot of reference to NoSQL Use Case but I wanted to make it simple. Though I am not a developer but thanks to my unix scripting skills.

So here is what I am planning to make

  • create a schema for storing server cpu details from mpstat command
  • storing it every minute
  • on 4 nodes
  • then some dashboards
AVRO Schema Design

Here I am creating an avro schema that can hold the date and time with the values from mpstat

cpudata.avsc
"type": "record", 
"name": "cpudata", 
"namespace":"avro", 
"fields": [ 
{"name": "yyyy", "type": "int", "default": 0},
{"name": "mm", "type": "int", "default": 0}, 
{"name": "dd", "type": "int", "default": 0}, 
{"name": "hh", "type": "int", "default": 0}, 
{"name": "mi", "type": "int", "default": 0}, 
{"name": "user", "type": "float", "default": 0}, 
{"name": "nice", "type": "float", "default": 0},
{"name": "sys", "type": "float", "default": 0},
{"name": "iowait", "type": "float", "default": 0},
{"name": "irq", "type": "float", "default": 0},
{"name": "soft", "type": "float", "default": 0},
{"name": "steal", "type": "float", "default": 0},
{"name": "idle", "type": "float", "default": 0},
{"name": "intr", "type": "float", "default": 0}

Now I am adding the schema to the store

$ java -jar $KVHOME/lib/kvstore.jar runadmin -host server1 -port 5000
kv-> ddl add-schema -file cpudata.avsc
Added schema: avro.cpudata.1
kv-> show schema
avro.cpudata
  ID: 1  Modified: 2014-04-18 00:29:58 UTC, From: server1
kv->

To load the data I am creating a shell script which will create the put kv -key command in a temporary file.
Later I load the temporary file immediately into the store
This is automated via a crontab job entry that runs every minute.
So this program is going to capture the server cpu metrics every minute. 

$ cat cpuload.sh
export KVHOME=$KVBASE/server2/oraclesoftware/kv-3.0.5
echo `hostname` `date +"%d-%m-%Y-%H-%M-%S"` `date +"%-d"` `date +"%-m"` `date +"%Y"` `date +"%-H"` `date +"%-M"` `mpstat|tail -1`|awk '{print "put kv -key /cpudata/"$1"/"$2" -value \"{\\\"yyyy\\\":"$5",\\\"mm\\\":"$4",\\\"dd\\\":"$3",\\\"hh\\\":"$6",\\\"mi\\\":"$7",\\\"user\\\":"$10",\\\"nice\\\":"$11",\\\"sys\\\":"$12",\\\"iowait\\\":"$13",\\\"irq\\\":"$14",\\\"soft\\\":"$15",\\\"steal\\\":"$16",\\\"idle\\\":"$17",\\\"intr\\\":"$18" }\" -json avro.cpudata"}' > /tmp/1.load
java -jar $KVHOME/lib/kvcli.jar -host server1 -port 5000 -store mystore load -file /tmp/1.load
$
$ crontab -l
* * * * * /oraclenosql/work/cpuload.sh
$

Since the job has been scheduled , I am testing the records if they are getting loaded

kv-> get kv -key /cpudata -all -keyonly
/cpudata/server1/18-04-2014-03-35-02
/cpudata/server1/18-04-2014-03-36-02
2 Keys returned.
kv->

Since the program has just started it has two records now

kv-> aggregate -count -key /cpudata
count: 2
kv->

A detailed listing of the two records

kv-> get kv -key /cpudata/server1 -all
/cpudata/server1/18-04-2014-03-37-02
{
  "yyyy" : 2014,
  "mm" : 4,
  "dd" : 18,
  "hh" : 3,
  "mi" : 37,
  "user" : 0.8799999952316284,
  "nice" : 1.350000023841858,
  "sys" : 0.38999998569488525,
  "iowait" : 1.0399999618530273,
  "irq" : 0.0,
  "soft" : 0.009999999776482582,
  "steal" : 0.03999999910593033,
  "idle" : 96.30000305175781,
  "intr" : 713.0399780273438
}
/cpudata/server1/18-04-2014-03-35-02
{
  "yyyy" : 2014,
  "mm" : 4,
  "dd" : 18,
  "hh" : 3,
  "mi" : 35,
  "user" : 0.8799999952316284,
  "nice" : 1.350000023841858,
  "sys" : 0.38999998569488525,
  "iowait" : 1.0399999618530273,
  "irq" : 0.0,
  "soft" : 0.009999999776482582,
  "steal" : 0.03999999910593033,
  "idle" : 96.30000305175781,
  "intr" : 713.0399780273438
}

Now I am going to sleep and next day going to have some fun. With 24 hours completed the store now has all the CPU metric for the whole day. Let me try some aggregate commands.

Average CPU usage 
kv-> aggregate -key /cpudata/server1 -avg user
avg(user): 0.8799999952316284

kv-> aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr
avg(user): 0.8799999952316284
avg(nice): 1.350000023841858
avg(sys): 0.38999998569488525
avg(iowait): 1.0399999618530273
avg(irq): 0.0
avg(soft): 0.009999999776482582
avg(steal): 0.03999999910593033
avg(idle): 96.30000305175781
avg(intr): 713.0599822998047
kv->

Let me bring a range and see the hourly usage

kv-> aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr -start 18-04-2014-04 -end 18-04-2014-05

avg(user): 0.8799999952316284
avg(nice): 1.350000023841858
avg(sys): 0.38999998569488525
avg(iowait): 1.0399999618530273
avg(irq): 0.0
avg(soft): 0.009999999776482582
avg(steal): 0.03999999910593033
avg(idle): 96.30000305175781
avg(intr): 713.0399780273438
kv-> aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr -start 18-04-2014-03-35-02 -end 18-04-2014-03-40-02

avg(user): 0.8799999952316284
avg(nice): 1.350000023841858
avg(sys): 0.38999998569488525
avg(iowait): 1.0399999618530273
avg(irq): 0.0
avg(soft): 0.009999999776482582
avg(steal): 0.03999999910593033
avg(idle): 96.30000305175781
avg(intr): 713.0849914550781
kv->


Interesting ?

Time for some dashboards


Hourly CPU Idle Metric 

$ for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> do
> echo "connect store -host server1 -port 5000 -name mystore" > /tmp/1.lst
> echo "aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr -start 18-04-2014-"$i" -end 18-04-2014-"$i >> /tmp/1.lst
> echo "18-04-2014-"$i" - "`java -jar $KVHOME/lib/kvstore.jar runadmin -host server1 -port 5000 load -file /tmp/1.lst|grep -i idle|awk '{print $2 }'`
> done
18-04-2014-01 - 96.27333068847656
18-04-2014-02 - 96.27999877929688
18-04-2014-03 - 96.30000305175781
18-04-2014-04 - 96.30000305175781
18-04-2014-05 - 96.30000305175781
18-04-2014-06 - 96.30000305175781
18-04-2014-07 - 96.28433303833008
18-04-2014-08 - 96.2699966430664
18-04-2014-09 - 96.2699966430664
18-04-2014-10 - 96.27333068847656
18-04-2014-11 - 96.27999877929688
18-04-2014-12 - 96.2870002746582
18-04-2014-13 - 96.29016761779785
18-04-2014-14 - 96.29683570861816
18-04-2014-15 - 96.302001953125
18-04-2014-16 - 96.30999755859375
18-04-2014-17 - 96.31849937438965
18-04-2014-18 - 96.32483406066895
18-04-2014-19 - 96.33000183105469
18-04-2014-20 - 96.3331667582194
18-04-2014-21 - 96.28135165652714
18-04-2014-22 - 96.27333068847656
18-04-2014-23 - 96.27999877929688
18-04-2014-24 - 96.27333068847656
$



Hourly CPU User Metric 

$ for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> do
> echo "connect store -host server1 -port 5000 -name mystore" > /tmp/1.lst
> echo "aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr -start 18-04-2014-"$i" -end 18-04-2014-"$i >> /tmp/1.lst
> echo "18-04-2014-"$i" - "`java -jar $KVHOME/lib/kvstore.jar runadmin -host server1 -port 5000 load -file /tmp/1.lst|grep -i user|awk '{print $2 }'`
> done
18-04-2014-01 - 0.8899999856948853
18-04-2014-02 - 0.8899999856948853
18-04-2014-03 - 0.8799999952316284
18-04-2014-04 - 0.8799999952316284
18-04-2014-05 - 0.8799999952316284
18-04-2014-06 - 0.8799999952316284
18-04-2014-07 - 0.8819999933242798
18-04-2014-08 - 0.8906666517257691
18-04-2014-09 - 0.8899999856948853
18-04-2014-10 - 0.8899999856948853
18-04-2014-11 - 0.8899999856948853
18-04-2014-12 - 0.8899999856948853
18-04-2014-13 - 0.8899999856948853
18-04-2014-14 - 0.8899999856948853
18-04-2014-15 - 0.8899999856948853
18-04-2014-16 - 0.8899999856948853
18-04-2014-17 - 0.8899999856948853
18-04-2014-18 - 0.8899999856948853
18-04-2014-19 - 0.8899999856948853
18-04-2014-20 - 0.8899999856948853
18-04-2014-21 - 0.8921276432402591
18-04-2014-22 - 0.8799999952316284
18-04-2014-23 - 0.8799999952316284
18-04-2014-24 - 0.8899999856948853
$





Hourly CPU IOWAIT Metric

$ for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> do
> echo "connect store -host server1 -port 5000 -name mystore" > /tmp/1.lst
> echo "aggregate -key /cpudata/server1 -avg user,nice,sys,iowait,irq,soft,steal,idle,intr -start 18-04-2014-"$i" -end 18-04-2014-"$i >> /tmp/1.lst
> echo "18-04-2014-"$i" - "`java -jar $KVHOME/lib/kvstore.jar runadmin -host server1 -port 5000 load -file /tmp/1.lst|grep -i iowait|awk '{print $2 }'`
> done
18-04-2014-01 - 1.0907692328477516
18-04-2014-02 - 1.0499999523162842
18-04-2014-03 - 1.0399999618530273
18-04-2014-04 - 1.0399999618530273
18-04-2014-05 - 1.0373332977294922
18-04-2014-06 - 1.0299999713897705
18-04-2014-07 - 1.0403332948684691
18-04-2014-08 - 1.0499999523162842
18-04-2014-09 - 1.0499999523162842
18-04-2014-10 - 1.0499999523162842
18-04-2014-11 - 1.0499999523162842
18-04-2014-12 - 1.0499999523162842
18-04-2014-13 - 1.0481666207313538
18-04-2014-14 - 1.0499999523162842
18-04-2014-15 - 1.0449999570846558
18-04-2014-16 - 1.0399999618530273
18-04-2014-17 - 1.0399999618530273
18-04-2014-18 - 1.0399999618530273
18-04-2014-19 - 1.0399999618530273
18-04-2014-20 - 1.0398332953453064
18-04-2014-21 - 1.0907692328477516
18-04-2014-22 - 1.0499999523162842
18-04-2014-23 - 1.0907692328477516
18-04-2014-24 - 1.0499999523162842
$



So this NoSQL Use Case is very simple. I have scheduled the jobs to run on another couple of servers so that my store can be used to analyze CPU metric for all my hosted servers. The avro schema can be expanded to have many more information.

Popular Posts