algorithm - Java Unique Number Generator collision test runs out of memory -


i have system requirement generate 11 characters string 8 rightmost characters must unique.

now understanding, @ happens few hundreds of times per day. due speed concerns, asked avoid using db retrieve nextval() in sequence unfortunately.

so left test various ways generate random number possible, , i've come solution based on securerandom class.

i decided test it, see how generated string repeat itself; tested using hashmap (string, string) 10 million generations - looks good, , hoping test night billion random strings, has failed due exception in thread "main" java.lang.outofmemoryerror: java heap space

the test code have far this:

public class main { public static biginteger base = biginteger.valueof(62); public static final string digits = "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";  public static void main(string[] args) {     // todo auto-generated method stub     long lstarttime = system.nanotime();     hashmap<string, string> orders = new hashmap<string, string>();      (int = 0; < 960000000; i++) {         securerandom randobj = new securerandom();         biginteger bigrand = new biginteger(128, randobj);         string rand = bigrand.tostring(62);          stringbuilder result = new stringbuilder();         while (bigrand.compareto(biginteger.zero) == 1 && result.length()<11) { // number > 0               biginteger[] divmod = bigrand.divideandremainder(base);               bigrand = divmod[0];               int digit = divmod[1].intvalue();               result.insert(0, digits.charat(digit));             }         string doeskeyexiststring = orders.get(result);         if (doeskeyexiststring != null) {             system.out.print("duplicate key found!: "+result.tostring()+"\n");         } else {             orders.put(result.tostring(), result.tostring());  // no such key         }     }     long lendtime = system.nanotime();     long difference1 = lendtime - lstarttime;     double difference = (double)difference1/1000000000;     system.out.println("elapsed seconds: " + difference);     system.out.println("elapsed exact: " + difference1);  } 

do have suggestions how prove can rely on method of generating random numbers, likelyhood of getting same string twice small enough?

i stumbled across question: random number generator test answer looks interesting, didn't quite understand how apply case (statistics hardest course, barely passed second attempt...)

i not sure, how adjust random generator dynamically set length of generated number.. there have better ways did here...

thanks!

let's take @ raw numbers here.

you're trying store 1 billion 11 character long strings in hashmap.

if calculate absolutely minimum space (11 character array + int length) gives us:

1e9 * (11 * 2 bytes + 4 bytes) = 26e9 bytes 

or 24 gigabytes. how memory solution requires.

if @ other side of equation. you're looking randomly generate 2 equal strings of length 8 using 62 character alphabet. means have

62 ^ 8 = 218340105584896 

or 2.18e14 different combinations. looking @ birthday problem can calculate number of strings need generate have probability of @ least 50% have generated same string twice. following formula, number 1.74e7 times. if generate 18 million strings probability have generated same string twice more 50%.

18 million strings should require

1.8e7 * (11 * 2 bytes + 4 bytes) = 4.68e8 bytes 

or 470 megabytes, should within limitations.

now, actual problem - use random uuid if possible since can rely on possibility of generating same uuid twice is, practical purposes, nonexistant.

if can't use uuid have use 8 characters suggest expand alphabet bit. using printable ascii characters (95 characters) increase number of possible combinations less 6.1e15 - though number of generations 50% chance of collision increases around 90 million.


Comments