python - Putting log files into Hive -


i have unstructured file has data like:

file.log:

2014-03-13 texas   334    4.985 2014-03-13    minnesota   534    6.544 

the log file not tab separated fields tab separated , not.

how can put hive table?

hive table schema is:

create table file (datefact string, country string, state string, id int, value string);

how can load log file hive table using python , or hadoop commands ?

thanks!

with regexserde, can use \s+ match multiple whitespace types (single spaces, multi spaces, tabs).

i don't have hive instance in front of me test, should idea code below.

create table file.log (   datefact string,   country string,   state string,   id string,   value string ) row format serde 'org.apache.hadoop.hive.contrib.serde2.regexserde' serdeproperties  ( "input.regex" = "([0-9]{4}-[0-9]{2}-[0-9]{2})\s+(\w+)\s+(\w+)\s+(\d+)\s+([\d.]+)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) stored textfile; 

Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -