hadoop - Merging MapReduce output -
i have 2 mapreduce jobs produce files in 2 separate directories so:
directory output1: ------------------ /output/20140102-r-00000.txt /output/20140102-r-00000.txt /output/20140103-r-00000.txt /output/20140104-r-00000.txt directory output2: ------------------ /output-update/20140102-r-00000.txt
i want merge these 2 directories in new directory /output-complete/ 20140102-r-00000.txt replaces original file in /output directory , of "-r-0000x" removed file name. 2 original directories empty , resulting directory should follows:
directory output3: ------------------- /output-complete/20140102.txt /output-complete/20140102.txt /output-complete/20140103.txt /output-complete/20140104.txt
what best way this? can use hdfs shell commands? need create java program traverse both directories , logic?
you can use pig ...
get_data = load '/output*/20140102*.txt' using loader() store get_data "/output-complete/20140102.txt"
or hdfs command...
hadoop fs -cat '/output*/20140102*.txt' > output-complete/20140102.txt
single qoutes may not work, try double quotes
Comments
Post a Comment