r - How to run permutations using mclapply in a reproducible way regardless of number of threads and OS? -
is possible run permutation-based function using mclapply in reproducible way regardless of number of threads , os?
below toy example. hashing of resulting list of permutated vectors convenience of comparing results. tried different rngkind
("l'ecuyer-cmrg"), different settings mc.preschedule
, mc.set.seed
. far no luck make them identical.
library("parallel") library("digest") set.seed(1) m <- mclapply(1:10, function(x) sample(1:10), mc.cores=2, mc.set.seed = f) digest(m, 'crc32') set.seed(1) m <- mclapply(1:10, function(x) sample(1:10), mc.cores=4, mc.set.seed = f) digest(m, 'crc32') set.seed(1) m <- mclapply(1:10, function(x) sample(1:10), mc.cores=2, mc.set.seed = f) digest(m, 'crc32') set.seed(1) m <- mclapply(1:10, function(x) sample(1:10), mc.cores=1, mc.set.seed = f) digest(m, 'crc32') set.seed(1) m <- lapply(1:10, function(x) sample(1:10)) digest(m, 'crc32') # equivalent on windows.
sessioninfo()
in case:
> sessioninfo() r version 3.2.0 (2015-04-16) platform: x86_64-apple-darwin13.4.0 (64-bit) running under: os x 10.9.5 (mavericks) locale: [1] en_us.utf-8/en_us.utf-8/en_us.utf-8/c/en_us.utf-8/en_us.utf-8 attached base packages: [1] parallel stats graphics grdevices utils datasets methods base other attached packages: [1] digest_0.6.8 loaded via namespace (and not attached): [1] tools_3.2.0
another approach first generate samples use , call mclapply on samples:
library("parallel") library("digest") input<-1:10 set.seed(1) nsamp<-20 ## generate , store random samples samples<-lapply(1:nsamp, function(x){ sample(input) }) ## apply algorithm "diff" on every sample ncore0<- lapply(samples, diff) ncore1<-mclapply(samples, diff, mc.cores=1) ncore2<-mclapply(samples, diff, mc.cores=2) ncore3<-mclapply(samples, diff, mc.cores=3) ncore4<-mclapply(samples, diff, mc.cores=4) ## equal all.equal(ncore0,ncore1) all.equal(ncore0,ncore2) all.equal(ncore0,ncore3) all.equal(ncore0,ncore4)
this assures reproducibility @ expense of using more memory , longer running time since computation done on each sample typically time-consuming operation.
note: use of mc.set.seed = f
in question generate same sample each core, not want.
Comments
Post a Comment