hadoop - Merging MapReduce output -


i have 2 mapreduce jobs produce files in 2 separate directories so:

 directory output1:  ------------------  /output/20140102-r-00000.txt  /output/20140102-r-00000.txt  /output/20140103-r-00000.txt  /output/20140104-r-00000.txt   directory output2:  ------------------  /output-update/20140102-r-00000.txt 

i want merge these 2 directories in new directory /output-complete/ 20140102-r-00000.txt replaces original file in /output directory , of "-r-0000x" removed file name. 2 original directories empty , resulting directory should follows:

 directory output3:  -------------------  /output-complete/20140102.txt  /output-complete/20140102.txt  /output-complete/20140103.txt  /output-complete/20140104.txt 

what best way this? can use hdfs shell commands? need create java program traverse both directories , logic?

you can use pig ...

get_data = load '/output*/20140102*.txt' using loader() store get_data "/output-complete/20140102.txt" 

or hdfs command...

hadoop fs -cat '/output*/20140102*.txt' > output-complete/20140102.txt 

single qoutes may not work, try double quotes


Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -