in c++ can initialize array value using memset:
const int max = 1000000; int is_prime[max] memset(is_prime, 1, sizeof(is_prime))
what memset does, crudely can described filling array value, doing really fast.
in go can is_prime := make([]int, 1000000)
, create slice 0, in similar manner can use new([1000000]int)
, nothing allow me create array/slice 1 or other non-zero element.
of course can use loop populate value later, main purpose of memset
is way way faster loop.
so go programmers have memset
analog (fast way of initializing array non-zero value)?
the simplest solution loop this:
func memsetloop(a []int, v int) { := range { a[i] = v } }
there no memset
support in standard library, can make use of built-in copy()
highly optimized.
with repeated copy()
we can set first element manually, , start copying set part unset part using copy()
; set part gets bigger , bigger every time (doubles), number of iterations log(n)
:
func memsetrepeat(a []int, v int) { if len(a) == 0 { return } a[0] = v bp := 1; bp < len(a); bp *= 2 { copy(a[bp:], a[:bp]) } }
this solution inspired implementation of bytes.repeat()
. if want create new []byte
filled same values, can use bytes.repeat()
function. can't use existing slice or slices other []byte
, can use presented memsetrepeat()
.
in case of small slices memsetrepeat()
may slower memsetloop()
(but in case of small slices doesn't matter, run in instant).
due using fast copy()
, memsetrepeat()
faster if number of elements grows.
benchmarking these 2 solutions:
var = make([]int, 1000) // size vary func benchmarkloop(b *testing.b) { := 0; < b.n; i++ { memsetloop(a, 10) } } func benchmarkrepeat(b *testing.b) { := 0; < b.n; i++ { memsetrepeat(a, 11) } }
benchmark results
100 elements: ~1.15 times faster
benchmarkloop 20000000 81.6 ns/op benchmarkrepeat 20000000 71.0 ns/op
1,000 elements: ~2.5 times faster
benchmarkloop 2000000 706 ns/op benchmarkrepeat 5000000 279 ns/op
10,000 elements: ~2 times faster
benchmarkloop 200000 7029 ns/op benchmarkrepeat 500000 3544 ns/op
100,000 elements: ~1.5 times faster
benchmarkloop 20000 70671 ns/op benchmarkrepeat 30000 45213 ns/op
the highest performance gain around 3800-4000 elements ~3.2 times faster.
Comments
Post a Comment