python - Google crawl 503 service unavailable -


i have got strange problem when crawl google search engine wget, curl or python on servers. google redirects me address starting [ipv4|ipv6].google.fr/sorry/indexredirect... , send 503 error, service unavailable...

sometimes crawl works correctly , not during day, , tried possible : forcing ipv4/ipv6 instead of hostname, referer, user agent, vpn, .com/.fr/, proxies , tor, ...

i guess error google servers... idea ? !

wget "http://google.fr/search?q=test" --2015-06-03 10:19:52--  http://google.fr/search?q=test resolving google.fr (google.fr)... 2a00:1450:400c:c05::5e, 173.194.67.94 connecting google.fr (google.fr)|2a00:1450:400c:c05::5e|:80... connected. http request sent, awaiting response... 302 found location: http://ipv6.google.com/sorry/indexredirect?continue=http://google.fr/search%3fq%3dtest&q=cgmsecabqdaauqabaaaaaaaah1qyqpg6qwuigqdxp4nlquhgp_i-oiuu0zshpumazrf3u_0 [following] --2015-06-03 10:19:53--  http://ipv6.google.com/sorry/indexredirect?continue=http://google.fr/search%3fq%3dtest&q=cgmsecabqdaauqabaaaaaaaah1qyqpg6qwuigqdxp4nlquhgp_i-oiuu0zshpumazrf3u_0 resolving ipv6.google.com (ipv6.google.com)... 2a00:1450:400c:c05::64 connecting ipv6.google.com (ipv6.google.com)|2a00:1450:400c:c05::64|:80... connected. http request sent, awaiting response... 503 service unavailable 2015-06-03 10:19:53 error 503: service unavailable. 

google have triggers sniff out bots , other abuse of terms of service, set limit (or "throttle") on number of calls same i.p. address can make on period of time. believe it's 10 calls per minute. case in point: if paste url browser when fails 503 error, you'll captcha challenge google prove not bot.

i using pattern.web module same thing doing (for harmless research purposes, of course!), , documentation library shows throttling limits popular apis (google, bing, twitter, facebook...).

try sending requests every 15+ seconds or so, avoid tripping throttle limit.


Comments