当前位置: > Linux集群 > Hadoop >

Hadoop在Map阶段获取当前split的文件名

时间:2016-06-10 01:25来源:linux.it.net.cn 作者:IT

在mapper处理阶段中有时候我们需要获取当前正在处理的HDFS文件名/HDFS目录名,其实我们可以通过 Context 来获取相关参数,代码类似如下:

 
 
1
2
3
4
5
FileSplit fileSplit = (FileSplit) context.getInputSplit();
System.out.println("========> getPath.getName = " + fileSplit.getPath().getName());
System.out.println("========> getPath = " + fileSplit.getPath().toString());
System.out.println("========> getPath.getParent = " + fileSplit.getPath().getParent().toString());
System.out.println("========> getPath.getParent.getName() = " + fileSplit.getPath().getParent().getName());

输出的日志信息如下:

 
1
2
3
4
========> getPath.getName = fatal_2015-02-05-04.log
========> getPath = hdfs://mycluster/user/micmiu/demo/nsplog/2015/02/05/7/fatal_2015-02-05-04.log
========> getPath.getParent = hdfs://mycluster/user/micmiu/demo/nsplog/2015/02/05/7
========> getPath.getParent.getName() = 7




(责任编辑:IT)
------分隔线----------------------------