User-agent: * Disallow: /secret/ Disallow: /error404.php Disallow: /product-faxform.php5 Disallow: /checkout/ Disallow: /customer/ Disallow: /productsheets/ Disallow: /supplierdata/ ### Huasai/1.0, too fast User-agent: Huasai Crawl-delay: 10 ### LEXI @ 2008-09-03: Allowed faster crawling for any agent below ### sogou.com spider, much too fast User-agent: Sogou web spider Crawl-delay: 20 ### YoudaoBot User-agent: YoudaoBot Crawl-delay: 20 ### Labhoo.com spider - hit the bot trap though they say they respect robots.txt User-agent: Labhoo Crawl-delay: 10 ### Slow-down MSN Bot to one call every 3 minutes User-agent: msnbot Crawl-delay: 10 ### Slow-down Yahoo Slurp to one call every 4 minutes User-agent: Slurp Crawl-delay: 10 ### Slow-down Ask Jeeves/Teoma to one call every minute User-agent: Jeeves/Teoma Crawl-delay: 10 ### Slow-down Exabot to one call every minute ### Never visited recently - just to make sure User-agent: Exabot Crawl-delay: 10 ### Netluchs.de Crawler - too fast - one call every minute User-agent: Netluchs/Nutch-0.9-dev Crawl-delay: 10 ### Try to deny User-agent: InetURL Disallow: / ### Potential email collectors User-agent: email Disallow: / ### A couple of Crawlers we have noticed #User-agent: West Wind Internet Protocols 4.55 # Requests homepage ~ once an hour #User-agent: Java/1.5.0_11 # Java/1.4.1_04 # Java/1.4.2_04 # 1.5 Appears to have crawled the entire alpha nav and then left #User-agent: libwww-perl/5.65 #User-agent: Google-Sitemaps/1.0 # Google Sitemap / Webmaster Tools verification #User-agent: Snoopy v1.2 # Now 403'd through .htaccess # A PHP class that emulates a web browser # http://sourceforge.net/projects/snoopy/ #User-agent: Feedfetcher-Google # http://www.google.com/feedfetcher.html; 1 subscribers; feed-id=2589828680658507079 #User-agent: AdsBot-Google # http://www.google.com/adsbot.html #User-agent: w00tw00t.at.ISC.SANS.DFind # Evil #User-agent: Netluchs/Nutch-0.9-dev # Way too fast #User-agent: Microsoft URL Control - 6.00.8862 #User-agent: Ask Jeeves/Teoma #User-agent: MagpieRSS/0.72 # OpenSource RSS Client #User-agent: ia_archiver # The Internet Archive / Wayback Machine #User-agent: Yahoo-MMCrawler/3.x #User-agent: Snapbot/1.0 # Reads robots.txt #User-agent: Semager/1.0 #User-agent: SeznamBot/1.0 #User-agent: PHP version tracker (http://www.nexen.net/phpversion/bot.php) #User-agent: NASA Search 1.0 #User-agent: sogou spider # Respects robots.txt #User-agent: VadixBot # Reads robots.txt #User-agent: Seekbot/1.0 (http://www.seekbot.net/bot.html) RobotsTxtFetcher/1.2 #User-agent: ApacheBench/2.0.41-dev