Cookie Information: 
Cookies are text files containing small amounts of data which are downloaded to your computer, or other device, when you visit a website.

Cookies are useful for carrying out various tasks, including improving your experience on our website. Some cookies are also necessary for the technical operation of our website. Cookies do not harm your computer.

For information about your cookie options including turning them off, click here.

To carry on with cookies running, click proceed or click the X to close this window and continue browsing. You can review your cookie options at any time by clicking on the Cookies link at the foot of each page
Proceed

The latest technology and data news, analysis and ideas from the DataMine Lab blog

Blog

YCSB run against HBase 0.92 on Amazon Elastic MapReduceSeptember 16, 2012 by Krystian Nowak

In this post we will show you how in simple steps using Yahoo! Cloud Serving Benchmark: https://github.com/dataminelab/YCSB you can run benchmarks against HBase 0.92 cluster deployed automatically by Amazon Elastic MapReduce and what measurements and comparisons you can obtain while choosing among different available instance types.

We will create EMR HBase clusters using the tooling provided by Amazon:
http://elasticmapreduce.s3.amazonaws.com/elastic-mapreduce-ruby.zip

Note: As you might see in commands.rb the default_hadoop_version is set to 0.20(.x), but as our tests found using Hadoop in version 1.0.3 has significant performance gain. Therefore when creating EMR cluster, we will explicitly set this version.

Let’s create one:

elastic-mapreduce --create \
--hbase \
--name "EMR HBase YCSB" \
--num-instances 2 \
--instance-type m1.large \
--hadoop-version 1.0.3
Created job flow j-1PP3JU6UJ0HQ1

elastic-mapreduce --list --active
j-1PP3JU6UJ0HQ1     WAITING
ec2-23-22-19-48.compute-1.amazonaws.com          EMR HBase YCSB
 COMPLETED      Start HBase

Build the project (HBase master server variables should now defaults to localhost (127.0.0.1)).

git clone git@github.com:dataminelab/YCSB.git
cd YCSB
export MAVEN_OPTS="-Xmx512m -Xms128m -Xss2m"

(check http://jira.codehaus.org/browse/MASSEMBLY-549 why…)

mvn clean install -Dcheckstyle.skip=true
cd distribution/target
scp -i ~/.ssh/dataminelab-ec2.pem ycsb-0.1.5-SNAPSHOT.tar.gz \
hadoop@ec2-23-22-19-48.compute-1.amazonaws.com:/home/hadoop/ycsb.tar.gz 
ssh -i ~/.ssh/dataminelab-ec2.pem \
hadoop@ec2-23-22-19-48.compute-1.amazonaws.com
tar xvzf ycsb.tar.gz
ln -s ycsb-0.1.5-SNAPSHOT ycsb
cd ycsb

Create the working table in HBase (aleady pre-split):

hbase org.apache.hadoop.hbase.util.RegionSplitter usertable -c 200 -f family

Hard to be perfect – because of https://issues.apache.org/jira/browse/HBASE-4163 is still not in place – please vote! :)
But it still seems to be better than no split at all!

You might spot:

12/08/25 13:39:16 ERROR metrics.MetricsSaver:
Failed SaveRecords hdfs:/mnt/var/lib/hadoop/metrics/raw/i-694c4712_04272_raw.bin
Shutdown in progress

as in https://forums.aws.amazon.com/thread.jspa?threadID=100643 but it doesn’t seem to hurt us…

hbase shell
scan '.META.', {COLUMNS => 'info:regioninfo'}
exit

Load initial data into HBase

./bin/ycsb load hbase -p columnfamily=family -P workloads/workloada | tee load.log

Check for your own eyes that the data is loaded into HBase

hbase shell

hbase(main):001:0> count 'usertable'
Current count: 1000, row: user995698996184959679
1000 row(s) in 2.3210 seconds

And run the tests – only as a warm-up:

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=10000 \
-s \
-threads 10 | tee warm-up-tests.log

And now the real tests with 10 threads:

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-a.log

cat real-tests-workload-a.log

[OVERALL], RunTime(ms), 47132.0
[OVERALL], Throughput(ops/sec), 2121.700755325469
[UPDATE], Operations, 50209
[UPDATE], AverageLatency(us), 186.93305980999423

And also 10 threads, but for another workload type.

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s -threads 10 | tee real-tests-workload-f.log
cat real-tests-workload-f.log

[OVERALL], RunTime(ms), 52748.0
[OVERALL], Throughput(ops/sec), 1895.8064760749223
[UPDATE], Operations, 50018
[UPDATE], AverageLatency(us), 11.925006997480907

Now we might check how these workload scenarios behave when increasing thread number.
Starting with 100 threads.

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-a-100t.log
cat real-tests-workload-a-100t.log

[OVERALL], RunTime(ms), 24234.0
[OVERALL], Throughput(ops/sec), 4126.433935792688
[UPDATE], Operations, 50063
[UPDATE], AverageLatency(us), 1076.5547010766434

500 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 500 | tee real-tests-workload-a-500t.log
cat real-tests-workload-a-500t.log

[OVERALL], RunTime(ms), 20706.0
[OVERALL], Throughput(ops/sec), 4829.518014102193
[UPDATE], Operations, 50099
[UPDATE], AverageLatency(us), 6167.192359128925

1000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-a-1kt.log
cat real-tests-workload-a-1kt.log

[OVERALL], RunTime(ms), 21484.0
[OVERALL], Throughput(ops/sec), 4654.626698938745
[UPDATE], Operations, 49988
[UPDATE], AverageLatency(us), 9423.208390013604

2000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-a-2kt.log
cat real-tests-workload-a-2kt.log

[OVERALL], RunTime(ms), 24358.0
[OVERALL], Throughput(ops/sec), 4105.427374989737
[UPDATE], Operations, 49957
[UPDATE], AverageLatency(us), 7786.985767760274

And the same for the other workload scenario now:
100 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-f-100t.log
cat real-tests-workload-f-100t.log

[OVERALL], RunTime(ms), 33924.0
[OVERALL], Throughput(ops/sec), 2947.7655936799906
[UPDATE], Operations, 50136
[UPDATE], AverageLatency(us), 17.44125977341631

1000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-f-1kt.log
cat real-tests-workload-f-1kt.log

[OVERALL], RunTime(ms), 29309.0
[OVERALL], Throughput(ops/sec), 3411.921252857484
[UPDATE], Operations, 50127
[UPDATE], AverageLatency(us), 16.611586570111914

2000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-f-2kt.log
cat real-tests-workload-f-2kt.log

[OVERALL], RunTime(ms), 29311.0
[OVERALL], Throughput(ops/sec), 3411.688444611238
[UPDATE], Operations, 49951
[UPDATE], AverageLatency(us), 59.80148545574663

3000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 3000 | tee real-tests-workload-f-3kt.log
cat real-tests-workload-f-3kt.log

[OVERALL], RunTime(ms), 32314.0
[OVERALL], Throughput(ops/sec), 3063.6875657609703
[UPDATE], Operations, 49492
[UPDATE], AverageLatency(us), 20.00127293299927

4000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 4000 | tee real-tests-workload-f-4kt.log
cat real-tests-workload-f-4kt.log

[OVERALL], RunTime(ms), 35051.0
[OVERALL], Throughput(ops/sec), 2852.985649482183
[UPDATE], Operations, 50095
[UPDATE], AverageLatency(us), 38.50611837508733

Let’s now try more instances instead just one slave – 4 slaves, same type as before.

elastic-mapreduce --create \
--hbase \
--name "EMR HBase YCSB" \
--num-instances 5 \
--instance-type m1.large \
--hadoop-version 1.0.3
Created job flow j-OE7G6YUHMD2I

elastic-mapreduce --list --active
j-OE7G6YUHMD2I      WAITING
ec2-50-17-100-242.compute-1.amazonaws.com         EMR HBase YCSB
COMPLETED      Start HBase

Now just copy already built test suite:

scp -i ~/.ssh/dataminelab-ec2.pem ycsb-0.1.5-SNAPSHOT.tar.gz \
hadoop@ec2-50-17-100-242.compute-1.amazonaws.com:/home/hadoop/ycsb.tar.gz
ssh -i ~/.ssh/dataminelab-ec2.pem \
hadoop@ec2-50-17-100-242.compute-1.amazonaws.com

tar xvzf ycsb.tar.gz
ln -s ycsb-0.1.5-SNAPSHOT ycsb
cd ycsb

Initialize table:

hbase org.apache.hadoop.hbase.util.RegionSplitter usertable -c 200 -f family

Load initial data:

./bin/ycsb load hbase \
-p columnfamily=family \
-P workloads/workloada | tee load.log

And run tests:
warm-up

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=10000 \
-s \
-threads 10 | tee warm-up-tests.log

10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-a.log
cat real-tests-workload-a.log

[OVERALL], RunTime(ms), 42609.0
[OVERALL], Throughput(ops/sec), 2346.9220117815485
[UPDATE], Operations, 50073
[UPDATE], AverageLatency(us), 117.53685618996265

100 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-a-100t.log
cat real-tests-workload-a-100t.log

[OVERALL], RunTime(ms), 23500.0
[OVERALL], Throughput(ops/sec), 4255.31914893617
[UPDATE], Operations, 49837
[UPDATE], AverageLatency(us), 1089.7759295302687

500 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 500 | tee real-tests-workload-a-500t.log
cat real-tests-workload-a-500t.log

[OVERALL], RunTime(ms), 19763.0
[OVERALL], Throughput(ops/sec), 5059.960532307848
[UPDATE], Operations, 50196
[UPDATE], AverageLatency(us), 4854.259104311101

1000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-a-1kt.log
cat real-tests-workload-a-1kt.log

[OVERALL], RunTime(ms), 20028.0
[OVERALL], Throughput(ops/sec), 4993.0097862991815
[UPDATE], Operations, 49904
[UPDATE], AverageLatency(us), 9582.977617024688

2000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-a-2kt.log
cat real-tests-workload-a-2kt.log

[OVERALL], RunTime(ms), 22608.0
[OVERALL], Throughput(ops/sec), 4423.2130219391365
[UPDATE], Operations, 49988
[UPDATE], AverageLatency(us), 6244.29357045691

5000 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 5000 | tee real-tests-workload-a-5kt.log
cat real-tests-workload-a-5kt.log

[OVERALL], RunTime(ms), 24861.0
[OVERALL], Throughput(ops/sec), 4022.3643457624394
[UPDATE], Operations, 50100
[UPDATE], AverageLatency(us), 8150.377125748503

10k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10000 | tee real-tests-workload-a-10kt.log
cat real-tests-workload-a-10kt.log

[OVERALL], RunTime(ms), 25336.0
[OVERALL], Throughput(ops/sec), 3946.9529523208084
[UPDATE], Operations, 50176
[UPDATE], AverageLatency(us), 8851.578204719388

workload f, 10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-f.log
cat real-tests-workload-f.log

[OVERALL], RunTime(ms), 53310.0
[OVERALL], Throughput(ops/sec), 1875.8206715438005
[UPDATE], Operations, 49867
[UPDATE], AverageLatency(us), 12.18058034371428

100 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-f-100t.log
cat real-tests-workload-f-100t.log

[OVERALL], RunTime(ms), 30991.0
[OVERALL], Throughput(ops/sec), 3226.7432480397533
[UPDATE], Operations, 50145
[UPDATE], AverageLatency(us), 13.73040183467943

1k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-f-1kt.log
cat real-tests-workload-f-1kt.log

[OVERALL], RunTime(ms), 29185.0
[OVERALL], Throughput(ops/sec), 3426.4176803152304
[UPDATE], Operations, 50047
[UPDATE], AverageLatency(us), 29.82979998801127

2k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-f-2kt.log
cat real-tests-workload-f-2kt.log

[OVERALL], RunTime(ms), 31906.0
[OVERALL], Throughput(ops/sec), 3134.206732276061
[UPDATE], Operations, 50111
[UPDATE], AverageLatency(us), 24.55253337590549

3k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 3000 | tee real-tests-workload-f-3kt.log
cat real-tests-workload-f-3kt.log

[OVERALL], RunTime(ms), 34410.0
[OVERALL], Throughput(ops/sec), 2877.070619006103
[UPDATE], Operations, 49607
[UPDATE], AverageLatency(us), 23.37424153849255

Now let’s see how even more serious instances offered by AWS would behave in this scenario!
m1.xlarge (2 x more memory, 2 x more CPU than m1.large)

elastic-mapreduce --create \
--hbase \
--name "EMR HBase YCSB" \
--num-instances 5 \
--instance-type m1.xlarge \
--hadoop-version 1.0.3
Created job flow j-2ICBS9029MJAV

./elastic-mapreduce --list --active
j-2ICBS9029MJAV      WAITING
ec2-107-21-130-111.compute-1.amazonaws.com         EMR HBase YCSB
COMPLETED      Start HBase

scp -i ~/.ssh/dataminelab-ec2.pem ycsb-0.1.5-SNAPSHOT.tar.gz \
hadoop@ec2-107-21-130-111.compute-1.amazonaws.com:/home/hadoop/ycsb.tar.gz
ssh -i ~/.ssh/dataminelab-ec2.pem \
hadoop@ec2-107-21-130-111.compute-1.amazonaws.com

tar xvzf ycsb.tar.gz
ln -s ycsb-0.1.5-SNAPSHOT ycsb
cd ycsb

hbase org.apache.hadoop.hbase.util.RegionSplitter usertable -c 200 -f family

./bin/ycsb load hbase \
-p columnfamily=family \
-P workloads/workloada | tee load.log

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=10000 \
-s \
-threads 10 | tee warm-up-tests.log

10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-a.log
cat real-tests-workload-a.log

[OVERALL], RunTime(ms), 39481.0
[OVERALL], Throughput(ops/sec), 2532.8639092221574
[UPDATE], Operations, 49981
[UPDATE], AverageLatency(us), 62.85440467377604

100 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-a-100t.log
cat real-tests-workload-a-100t.log

[OVERALL], RunTime(ms), 17877.0
[OVERALL], Throughput(ops/sec), 5593.779716954747
[UPDATE], Operations, 50100
[UPDATE], AverageLatency(us), 640.4568662674651

1k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s -threads 1000 | tee real-tests-workload-a-1kt.log
cat real-tests-workload-a-1kt.log

[OVERALL], RunTime(ms), 13986.0
[OVERALL], Throughput(ops/sec), 7150.00715000715
[UPDATE], Operations, 49750
[UPDATE], AverageLatency(us), 8759.566291457286

2k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-a-2kt.log
cat real-tests-workload-a-2kt.log

[OVERALL], RunTime(ms), 14783.0
[OVERALL], Throughput(ops/sec), 6764.526821348847
[UPDATE], Operations, 50118
[UPDATE], AverageLatency(us), 26718.534857735744

3k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 3000 | tee real-tests-workload-a-3kt.log
cat real-tests-workload-a-3kt.log

[OVERALL], RunTime(ms), 15477.0
[OVERALL], Throughput(ops/sec), 6396.588486140725
[UPDATE], Operations, 49465
[UPDATE], AverageLatency(us), 12066.01403012231

4k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 4000 | tee real-tests-workload-a-4kt.log
cat real-tests-workload-a-4kt.log

[OVERALL], RunTime(ms), 15261.0
[OVERALL], Throughput(ops/sec), 6552.650547146321
[UPDATE], Operations, 49883
[UPDATE], AverageLatency(us), 22551.664294449012

another workload, 10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-f.log
cat real-tests-workload-f.log

[OVERALL], RunTime(ms), 45751.0
[OVERALL], Throughput(ops/sec), 2185.744573889095
[UPDATE], Operations, 49950
[UPDATE], AverageLatency(us), 9.801721721721721

500 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 500 | tee real-tests-workload-f-500t.log
cat real-tests-workload-f-500t.log

[OVERALL], RunTime(ms), 21870.0
[OVERALL], Throughput(ops/sec), 4572.473708276178
[UPDATE], Operations, 49678
[UPDATE], AverageLatency(us), 11.18187125085551

1k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-f-1kt.log
cat real-tests-workload-f-1kt.log

[OVERALL], RunTime(ms), 19207.0
[OVERALL], Throughput(ops/sec), 5206.435153850159
[UPDATE], Operations, 49879
[UPDATE], AverageLatency(us), 11.812406022574631

2k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-f-2kt.log
cat real-tests-workload-f-2kt.log

[OVERALL], RunTime(ms), 20493.0
[OVERALL], Throughput(ops/sec), 4879.715024642561
[UPDATE], Operations, 50114
[UPDATE], AverageLatency(us), 12.770423434569182

And for now, more CPU power!
c1.xlarge (same memory, 5 x more CPU than m1.large)

elastic-mapreduce --create \
--hbase \
--name "EMR HBase YCSB" \
--num-instances 5 \
--instance-type c1.xlarge \
--hadoop-version 1.0.3
Created job flow j-3KZHQRG2D74AY

./elastic-mapreduce --list --active
j-3KZHQRG2D74AY     WAITING
ec2-75-101-255-226.compute-1.amazonaws.com          EMR HBase YCSB
COMPLETED      Start HBase

scp -i ~/.ssh/dataminelab-ec2.pem ycsb-0.1.5-SNAPSHOT.tar.gz \
hadoop@ec2-75-101-255-226.compute-1.amazonaws.com:/home/hadoop/ycsb.tar.gz
ssh -i ~/.ssh/dataminelab-ec2.pem \
hadoop@ec2-75-101-255-226.compute-1.amazonaws.com

tar xvzf ycsb.tar.gz
ln -s ycsb-0.1.5-SNAPSHOT ycsb
cd ycsb

hbase org.apache.hadoop.hbase.util.RegionSplitter usertable -c 200 -f family

./bin/ycsb load hbase \
-p columnfamily=family \
-P workloads/workloada | tee load.log

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=10000 \
-s \
-threads 10 | tee warm-up-tests.log

10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-a.log
cat real-tests-workload-a.log

[OVERALL], RunTime(ms), 32121.0
[OVERALL], Throughput(ops/sec), 3113.228106223343
[UPDATE], Operations, 49973
[UPDATE], AverageLatency(us), 71.10029415884577

100 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 100 | tee real-tests-workload-a-100t.log
cat real-tests-workload-a-100t.log

[OVERALL], RunTime(ms), 15076.0
[OVERALL], Throughput(ops/sec), 6633.059166887769
[UPDATE], Operations, 50167
[UPDATE], AverageLatency(us), 644.8327187194769

1k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-a-1kt.log
cat real-tests-workload-a-1kt.log

[OVERALL], RunTime(ms), 12864.0
[OVERALL], Throughput(ops/sec), 7773.63184079602
[UPDATE], Operations, 50240
[UPDATE], AverageLatency(us), 9889.390306528663

2k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-a-2kt.log
cat real-tests-workload-a-2kt.log

[OVERALL], RunTime(ms), 14889.0
[OVERALL], Throughput(ops/sec), 6716.367788300087
[UPDATE], Operations, 50216
[UPDATE], AverageLatency(us), 41222.41986617811

3k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 3000 | tee real-tests-workload-a-3kt.log
cat real-tests-workload-a-3kt.log

[OVERALL], RunTime(ms), 14461.0
[OVERALL], Throughput(ops/sec), 6845.9995850909345
[UPDATE], Operations, 49451
[UPDATE], AverageLatency(us), 51852.53568178601

5k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 5000 | tee real-tests-workload-a-5kt.log
cat real-tests-workload-a-5kt.log

[OVERALL], RunTime(ms), 17072.0
[OVERALL], Throughput(ops/sec), 5857.544517338331
[UPDATE], Operations, 49835
[UPDATE], AverageLatency(us), 82378.54861041436

10k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloada \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10000 | tee real-tests-workload-a-10kt.log
cat real-tests-workload-a-10kt.log

[OVERALL], RunTime(ms), 20226.0
[OVERALL], Throughput(ops/sec), 4944.131316127757
[UPDATE], Operations, 50113
[UPDATE], AverageLatency(us), 49147.25219005049

another workload, 10 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 10 | tee real-tests-workload-f.log
cat real-tests-workload-f.log

[OVERALL], RunTime(ms), 40801.0
[OVERALL], Throughput(ops/sec), 2450.920320580378
[UPDATE], Operations, 49966
[UPDATE], AverageLatency(us), 12.13715326421967

400 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 400 | tee real-tests-workload-f-400t.log
cat real-tests-workload-f-400t.log

[OVERALL], RunTime(ms), 17856.0
[OVERALL], Throughput(ops/sec), 5600.358422939068
[UPDATE], Operations, 50071
[UPDATE], AverageLatency(us), 14.301591739729584

500 threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 500 | tee real-tests-workload-f-500t.log
cat real-tests-workload-f-500t.log

[OVERALL], RunTime(ms), 17909.0
[OVERALL], Throughput(ops/sec), 5583.784689262382
[UPDATE], Operations, 50210
[UPDATE], AverageLatency(us), 16.105915156343357

1k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 1000 | tee real-tests-workload-f-1kt.log
cat real-tests-workload-f-1kt.log

[OVERALL], RunTime(ms), 16982.0
[OVERALL], Throughput(ops/sec), 5888.5879166175955
[UPDATE], Operations, 50088
[UPDATE], AverageLatency(us), 15.313268647180962

2k threads

./bin/ycsb run hbase \
-p columnfamily=family \
-P workloads/workloadf \
-p columnfamily=family \
-p operationcount=100000 \
-s \
-threads 2000 | tee real-tests-workload-f-2kt.log
cat real-tests-workload-f-2kt.log

[OVERALL], RunTime(ms), 17219.0
[OVERALL], Throughput(ops/sec), 5807.538184563564
[UPDATE], Operations, 49989
[UPDATE], AverageLatency(us), 17.61469523295125

Even after running these simple scenarios we are able to check how for given configuration the number of threads used influences the throughput for each of workload type:

  • workload a:
  • workload f:

You can now play with other instance types and instance numbers. You can also mix multiple nodes running YCSB benchmark code and observe possible saturation, either from master’s CPU or network layer.

We also invite you to play with the code or even contribute features and improvements, so that others can benefit from them too – have fun!

Leave a Reply

Your email address will not be published. Required fields are marked *