分享知识,分享快乐

0%

hdfs性能测试

hdfs性能测试

hadoop自带TestDFSIO测试

1
2
3
4
5
6
7
cd /tmp
sudo -u hdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.0.1-tests.jar TestDFSIO \
-D mapreduce.job.queuename=bf_yarn_pool.production \
-D test.build.data=/tmp/benchmark \
-D mapreduce.output.fileoutputformat.compress=false \
-write -nrFiles 10 -fileSize 1000

–结果

1
2
3
4
5
6
7
8
9
21/01/19 15:29:04 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
21/01/19 15:29:04 INFO fs.TestDFSIO: Date & time: Tue Jan 19 15:29:04 CST 2021
21/01/19 15:29:04 INFO fs.TestDFSIO: Number of files: 10
21/01/19 15:29:04 INFO fs.TestDFSIO: Total MBytes processed: 10000
21/01/19 15:29:04 INFO fs.TestDFSIO: Throughput mb/sec: 23.96
21/01/19 15:29:04 INFO fs.TestDFSIO: Average IO rate mb/sec: 32.37
21/01/19 15:29:04 INFO fs.TestDFSIO: IO rate std deviation: 29.51
21/01/19 15:29:04 INFO fs.TestDFSIO: Test exec time sec: 68
21/01/19 15:29:04 INFO fs.TestDFSIO:

结果说明:

Total MBytes processed : 总共需要写入的数据量 ==》 256*1000

Throughput mb/sec :总共需要写入的数据量/(每个map任务实际写入数据的执行时间之和(这个时间会远小于Test exec time sec))==》256000/(map1写时间+map2写时间+…)

Average IO rate mb/sec :(每个map需要写入的数据量/每个map任务实际写入数据的执行时间)之和/任务数==》(1000/map1写时间+1000/map2写时间+…)/256,所以这个值跟上面一个值总是存在差异。

IO rate std deviation :上一个值的标准差

Test exec time sec :整个job的执行时间

testDFSIO的参数如下:

read 读测试。执行该测试之前,需要先做write测试
write 写测试
nfFiles 文件个数,默认为1
fileSize 文件大小,默认为1MB
resFile 结果文件名,默认为” TestDFSIO_results.log”
bufferSize 设置缓存大小,默认为1000000
clean 清理数据
seq 数据是否有序,默认无序

备注

如果不到/tmp目录执行 会报TestDFSIO_results.log没有写入权限

1
2
3
4
5
6
7
8
9
ava.io.FileNotFoundException: TestDFSIO_results.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.hadoop.fs.TestDFSIO.analyzeResult(TestDFSIO.java:1068)
at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:891)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:742)

如果不关闭压缩,会报part-00000不存在,因为默认启用了snappy压缩 ,文件是part-00000.snappy

1
2
3
4
5
6
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/benchmark/io_write/part-00000
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1937)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:728)