Hi everyone,
We are dealing with a very large Motus database (.motus file size is more than 3Go), which takes several hours to convert to a dataframe with R. We are interested to keep a high level of definition of detection in relation to time.
As anybody developed any ideas to accelerate the extraction with that kind of dB, excepted applying severe filter on the ts variable?
We wonder if some of you have had to work with parallel processing yet, and if so, could give us advices or R codes to speed up the extraction of data.
More precisely, we would like to know if it is possible to parallelized this code from the Motus R Book and how to do it?
sql.motus <- tagme(proj.num, update = TRUE, dir = "./data/")
tbl.alltags <- tbl(sql.motus, "alltags")
df.alltags <- tbl.alltags %>%
mutate(recvLat = if_else((is.na(gpsLat)|gpsLat == 0),
recvDeployLat, gpsLat),
recvLon = if_else((is.na(gpsLon)|gpsLon == 0),
recvDeployLon, gpsLon),
recvAlt = if_else(is.na(gpsAlt), recvDeployAlt, gpsAlt)) %>%
select(-noise, -slop, -burstSlop, -done, -bootnum, -mfgID,
-codeSet, -mfg, -nomFreq, -markerNumber, -markerType,
-tagDeployComments, -fullID, -deviceID, -recvDeployLat,
-recvDeployLon, -recvDeployAlt, -speciesGroup, -gpsLat,
-gpsLon, - recvAlt, - recvSiteName) %>%
collect() %>%
as.data.frame() %>%
mutate(ts = as_datetime(ts), # work with dates AFTER transforming to flat file
tagDeployStart = as_datetime(tagDeployStart),
tagDeployEnd = as_datetime(tagDeployEnd),
recvLat = plyr::round_any(recvLat, 0.05),
recvLon = plyr::round_any(recvLon, 0.05),
recvDeployName = if_else(is.na(recvDeployName),
paste(recvLat, recvLon, sep=":"),
recvDeployName))
Thank you
François Gagnon, biologist M.Sc
Environment and Climate change Canada
1550, Av. D’Estimauville, Québec, QC
email : Francois.Gagnon14@canada.ca