database - Load the R output eachtime in MySQL DB -
i have loop iterates each row in data frame , generates 5 output files , keep appending on same file records - code working fine.
for (n in 1:nrow(data)) { #initialize required variables (some function ---) #write file output file df1= data.frame (key,value) df2= data.frame (key,value) df3= data.frame (key,value) df4= data.frame (key,value) df5= data.frame (key,value) write.table(df1, file = paste(pradd,"table1.tsv",sep="/"), append = true, quote = false, sep = "\t") write.table(df2, file = paste(pradd,"table2.tsv",sep="/"), append = true, quote = false, sep = "\t") write.table(df3, file = paste(pradd,"table3.tsv",sep="/"), append = true, quote = false, sep = "\t") write.table(df4, file = paste(pradd,"table4.tsv",sep="/"), append = true, quote = false, sep = "\t") write.table(df5, file = paste(pradd,"table5.tsv",sep="/"), append = true, quote = false, sep = "\t") }
however wanted load output - directly r mysql db 5 tables. these 5 files large in size, have use library load faster db table.
i using rodbc
connect mysql. can read mysql tables r, estabilishing connection. not sure how load 5 tables each time - loop runs.
update: have tried below code - it's loading - sqlquery
slow .my loop processing fast, when try load database slow.
for (i in 1: nrow(data)) { (--some function create dataset 1 dataset 4 --- ) # collect result in dataset1 , put them in .txt file , use .txt file load db write.table(dataset1 , file1.txt, append=false, sep = "\t") sqlquery(channel," load data local infile 'file1.txt' table lta_r_db.file1 fields terminated '\t' lines terminated '\\n';") # collect result in dataset2 , put them in .txt file , use .txt file load db write.table(dataset2 , file2.txt, append=false, sep = "\t") sqlquery(channel," load data local infile 'file2.txt' table lta_r_db.file2 fields terminated '\t' lines terminated '\\n';") write.table(dataset3 , file3.txt, append=false, sep = "\t") sqlquery(channel," load data local infile 'file3.txt' table lta_r_db.file3 fields terminated '\t' lines terminated '\\n';") write.table(dataset4 , file4.txt, append=false, sep = "\t") sqlquery(channel," load data local infile 'file4.txt' table lta_r_db.file4 fields terminated '\t' lines terminated '\\n';") write.table(dataset5 , file5.txt, append=false, sep = "\t") sqlquery(channel," load data local infile 'file5.txt' table lta_r_db.file5 fields terminated '\t' lines terminated '\\n';")
how can run above improved processing.
here can find way fictious data .. following code tests repeated insertion of list of data.frame in mysql db rmysql::dbwritetable
using base::mapply
, parallel::mcmapply
consider that, in general, few accesses db (and more data per access), better is. alternative, may try pack data in larger tables in r , export'em mysql.
however, mysql side..
## mysql> show tables; ## +----------------+ ## | tables_in_test | ## +----------------+ ## | table1 | ## | table2 | ## | table3 | ## | table4 | ## | table5 | ## +----------------+ ## 5 rows in set (0.00 sec) ## mysql> describe table1; ## +-------+---------+------+-----+---------+-------+ ## | field | type | null | key | default | | ## +-------+---------+------+-----+---------+-------+ ## | id | int(11) | yes | | null | | ## | value | float | yes | | null | | ## +-------+---------+------+-----+---------+-------+ ## 2 rows in set (0.00 sec)
then
## ## ---------------------------------------------------------- ## ## few benchmark ## ## ---------------------------------------------------------- ## create data id <- 1:5 val <- rnorm(5) ## need have list of data.frame, eg my.df <- data.frame("id"= 1:5, "value"=rnorm(5)) df.list <- split(my.df, my.df$id) ## otherwise ## df.list <- list(df1,df2,df3,df4,df5) tab.names <- paste("table", 1:5, sep="") ## number of repetitions reps <- 100 clean <- function() { dbsendquery(my.con, "delete table1") dbsendquery(my.con, "delete table2") dbsendquery(my.con, "delete table3") dbsendquery(my.con, "delete table4") dbsendquery(my.con, "delete table5") } ## ## ------------------------------------------------------- library(rmysql) library(parallel) my.con <- dbconnect(mysql(), host = "localhost", dbname = "test", user = "your.username", password = "your.password") clean() system.time( (i in 1:reps) { mapply(dbwritetable, name=tab.names , value = df.list, moreargs=list(conn = my.con, append = t, row.names = false)) }) clean() system.time( (i in 1:reps) { mcmapply(dbwritetable, name=tab.names , value = df.list, moreargs=list(conn = my.con, append = t, row.names = false), mc.cores = 8, mc.preschedule = false ) }) clean() dbdisconnect(my.con) rm(my.con)
i didn't find increased spead using parallelization. in end, @ least in setting, use simple
mapply(dbwritetable, name=paste("table", 1:5, sep="") , value = df.list, moreargs=list(conn = my.con, append = t, row.names = false))
i don't use odbc lot in unix could't test against rodbc, hth.
cheers
Comments
Post a Comment