i have system requirement generate 11 characters string 8 rightmost characters must unique.
now understanding, @ happens few hundreds of times per day. due speed concerns, asked avoid using db retrieve nextval() in sequence unfortunately.
so left test various ways generate random number possible, , i've come solution based on securerandom class.
i decided test it, see how generated string repeat itself; tested using hashmap (string, string) 10 million generations - looks good, , hoping test night billion random strings, has failed due exception in thread "main" java.lang.outofmemoryerror: java heap space
the test code have far this:
public class main { public static biginteger base = biginteger.valueof(62); public static final string digits = "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"; public static void main(string[] args) { // todo auto-generated method stub long lstarttime = system.nanotime(); hashmap<string, string> orders = new hashmap<string, string>(); (int = 0; < 960000000; i++) { securerandom randobj = new securerandom(); biginteger bigrand = new biginteger(128, randobj); string rand = bigrand.tostring(62); stringbuilder result = new stringbuilder(); while (bigrand.compareto(biginteger.zero) == 1 && result.length()<11) { // number > 0 biginteger[] divmod = bigrand.divideandremainder(base); bigrand = divmod[0]; int digit = divmod[1].intvalue(); result.insert(0, digits.charat(digit)); } string doeskeyexiststring = orders.get(result); if (doeskeyexiststring != null) { system.out.print("duplicate key found!: "+result.tostring()+"\n"); } else { orders.put(result.tostring(), result.tostring()); // no such key } } long lendtime = system.nanotime(); long difference1 = lendtime - lstarttime; double difference = (double)difference1/1000000000; system.out.println("elapsed seconds: " + difference); system.out.println("elapsed exact: " + difference1); }
do have suggestions how prove can rely on method of generating random numbers, likelyhood of getting same string twice small enough?
i stumbled across question: random number generator test answer looks interesting, didn't quite understand how apply case (statistics hardest course, barely passed second attempt...)
i not sure, how adjust random generator dynamically set length of generated number.. there have better ways did here...
thanks!
let's take @ raw numbers here.
you're trying store 1 billion 11 character long strings in hashmap
.
if calculate absolutely minimum space (11 character array + int length) gives us:
1e9 * (11 * 2 bytes + 4 bytes) = 26e9 bytes
or 24 gigabytes. how memory solution requires.
if @ other side of equation. you're looking randomly generate 2 equal strings of length 8 using 62 character alphabet. means have
62 ^ 8 = 218340105584896
or 2.18e14
different combinations. looking @ birthday problem can calculate number of strings need generate have probability of @ least 50% have generated same string twice. following formula, number 1.74e7
times. if generate 18 million strings probability have generated same string twice more 50%.
18 million strings should require
1.8e7 * (11 * 2 bytes + 4 bytes) = 4.68e8 bytes
or 470 megabytes, should within limitations.
now, actual problem - use random uuid if possible since can rely on possibility of generating same uuid twice is, practical purposes, nonexistant.
if can't use uuid have use 8 characters suggest expand alphabet bit. using printable ascii characters (95 characters) increase number of possible combinations less 6.1e15 - though number of generations 50% chance of collision increases around 90 million.
Comments
Post a Comment