python - Putting log files into Hive -
i have unstructured file has data like:
file.log:
2014-03-13 texas 334 4.985 2014-03-13 minnesota 534 6.544
the log file not tab separated fields tab separated , not.
how can put hive table?
hive table schema is:
create table file (datefact string, country string, state string, id int, value string);
how can load log file hive table using python , or hadoop commands ?
thanks!
with regexserde, can use \s+
match multiple whitespace types (single spaces, multi spaces, tabs).
i don't have hive instance in front of me test, should idea code below.
create table file.log ( datefact string, country string, state string, id string, value string ) row format serde 'org.apache.hadoop.hive.contrib.serde2.regexserde' serdeproperties ( "input.regex" = "([0-9]{4}-[0-9]{2}-[0-9]{2})\s+(\w+)\s+(\w+)\s+(\d+)\s+([\d.]+)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) stored textfile;
Comments
Post a Comment