r - How to run permutations using mclapply in a reproducible way regardless of number of threads and OS? -


is possible run permutation-based function using mclapply in reproducible way regardless of number of threads , os?
below toy example. hashing of resulting list of permutated vectors convenience of comparing results. tried different rngkind ("l'ecuyer-cmrg"), different settings mc.preschedule , mc.set.seed. far no luck make them identical.

library("parallel") library("digest")  set.seed(1) m <- mclapply(1:10, function(x) sample(1:10),               mc.cores=2, mc.set.seed = f) digest(m, 'crc32')  set.seed(1) m <- mclapply(1:10, function(x) sample(1:10),               mc.cores=4, mc.set.seed = f) digest(m, 'crc32')  set.seed(1) m <- mclapply(1:10, function(x) sample(1:10),               mc.cores=2, mc.set.seed = f) digest(m, 'crc32')  set.seed(1) m <- mclapply(1:10, function(x) sample(1:10),               mc.cores=1, mc.set.seed = f) digest(m, 'crc32')  set.seed(1) m <- lapply(1:10, function(x) sample(1:10)) digest(m, 'crc32') # equivalent on windows. 

sessioninfo() in case:

> sessioninfo() r version 3.2.0 (2015-04-16) platform: x86_64-apple-darwin13.4.0 (64-bit) running under: os x 10.9.5 (mavericks)  locale: [1] en_us.utf-8/en_us.utf-8/en_us.utf-8/c/en_us.utf-8/en_us.utf-8  attached base packages: [1] parallel  stats     graphics  grdevices utils     datasets  methods   base       other attached packages: [1] digest_0.6.8  loaded via namespace (and not attached): [1] tools_3.2.0 

another approach first generate samples use , call mclapply on samples:

    library("parallel")     library("digest")      input<-1:10     set.seed(1)     nsamp<-20     ## generate , store random samples     samples<-lapply(1:nsamp, function(x){ sample(input) })      ## apply algorithm "diff" on every sample     ncore0<-  lapply(samples, diff)     ncore1<-mclapply(samples, diff, mc.cores=1)     ncore2<-mclapply(samples, diff, mc.cores=2)     ncore3<-mclapply(samples, diff, mc.cores=3)     ncore4<-mclapply(samples, diff, mc.cores=4)      ## equal     all.equal(ncore0,ncore1)     all.equal(ncore0,ncore2)     all.equal(ncore0,ncore3)     all.equal(ncore0,ncore4) 

this assures reproducibility @ expense of using more memory , longer running time since computation done on each sample typically time-consuming operation.

note: use of mc.set.seed = f in question generate same sample each core, not want.


Comments