Banned for Being Badly Behaved Idiots...
|
11-20-2005, 11:45 AM,
Post: #1
|
|||
|
|||
Banned for Being Badly Behaved Idiots...
We banned .the following annoying spiders from accessing the directory site that TulipTools shares a server with (and here's how you can ban them too):
RufusBot: it obeys robots.txt so you can ban it by adding this to your robots.txt file: Code: User-agent: RufusBot Microsoft URL Control - 6.00.8862: doesn't obey robots.txt, can be banned if you have Apache mod rewrite enabled by adding this to your .htaccess file: Code: Options +FollowSymlinks EasyDL/3.04 -another one that doesn't obey robots.txt, ban it by adding the following to your .htaccess file: Code: <Limit GET POST> These are all Spam Harvesting Bots: ban them all by adding the following to your .htaccess file: Code: SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
"Well, Jay was so giddy that someone named Jay was involved with this site we posted our first non-eBay listing in 3 years here at Lunarbid (we tried two items at Yahoo once upon a time, they bombed)" -Marie posting in a LunarBid thread at OTWA in 2005 wins the award for 'most moronic reason ever given for choosing a venue"
"thanks twat u must have nothing better 2 do. do u talk to all your members like that. will not be recomending your site. best way to put it is TULIPTOOLS.COM IS REALLY SHIT. DONT JOIN." -pubescent owner of rinky dink off2auction.com in 2011 |
|||
11-20-2005, 01:25 PM,
Post: #2
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
Add some more spidering idiots to the list but these we can't ban.
We turned off the Apache web server for a few minutes a little while ago A. just to annoy TulipTools users and snicker at the thought of all of you getting the white 'can't connect' error messages and B. so we could stop our log files and see who exactly (as in which idiot spiders) have been arriving en masse the past few days in the early morning and again in the late afternoon and causing our server to slow to a crawl or be inaccessible for short periods of time (because they're hitting the MYSQL database with 250-300 simultaneous requests) The winners of the Spiderboinktard Award are....Yahoo Slurp and Yahoo Slurp China are both arriving (with lots of little children) at the same time...and about 5 minutes after they arrive LookSmart's WiseNutJob shows up with a few friends to join the party...grrrrr...and none of them appear to be obeying the rules
"Well, Jay was so giddy that someone named Jay was involved with this site we posted our first non-eBay listing in 3 years here at Lunarbid (we tried two items at Yahoo once upon a time, they bombed)" -Marie posting in a LunarBid thread at OTWA in 2005 wins the award for 'most moronic reason ever given for choosing a venue"
"thanks twat u must have nothing better 2 do. do u talk to all your members like that. will not be recomending your site. best way to put it is TULIPTOOLS.COM IS REALLY SHIT. DONT JOIN." -pubescent owner of rinky dink off2auction.com in 2011 |
|||
11-21-2005, 10:59 PM,
Post: #3
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
:
http://help.yahoo.com/help/us/ysearch/sl...rp-03.html How can I reduce the number of requests you make on my web site? There is a Yahoo! Slurp-specific extension to robots.txt which allows you to set a lower limit on our crawler request rate. You can add a "Crawl-delay: xx" instruction, where "xx" is the a delay in seconds between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 5 or 20 or a comfortable value for your server. Setting a crawl-delay of 20 seconds for Yahoo! Slurp would look something like: Code: User-agent: Slurp |
|||
11-22-2005, 09:40 PM,
Post: #4
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
rose Wrote:There is a Yahoo! Slurp-specific extension to robots.txt which allows you to set a lower limit on our crawler request rate. We added the Crawl-delay for both Slurp and MSNbot Sunday night. Started out at 5 seconds for Slurp, didn't work, upped it to 10 still didn't work, and increased it to 15. We survived the heavy morning Slurp crawl unscathed today. We didn't make it through last night though. We restarted Apache 5 times last evening to shake the spiders. Slurp, Slurp China, MSNbot, Googlebot, WiseNutBot, Lycos, Dir.com, and a few others all arrived at once. 40 spiders unleashed in our database using the cache to do metasearches on the net. Hundreds of searches a minute, thousands of database queries a minute : On the bright side 112,000 pages of that site are indexed in Yahoo now and the number is increasing daily. |
|||
11-23-2005, 11:20 AM,
Post: #5
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
We survived the spiders last night.
A Yahoo search for site:----.com now yields 126,000 results, a search for ----.com yields 487,000 results, and a search for link: www.----.com yields 128,000 results. I'm happy. TulipTools is another story: site: tt.com 227, community.tt.com 271, and link: tt.com 5,310 . Search Engines have been slow to index the forums on any site where we've used this forum software. The Crawl-delay for MSNbot. Delay times can be 5 to 120 seconds. We're using 40 on the search site. Code: User-agent: msnbot |
|||
11-23-2005, 11:51 PM,
Post: #6
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
Quote:We survived the spiders last night. |
|||
11-24-2005, 10:48 AM,
(This post was last modified: 11-24-2005, 11:50 AM by mandy.)
Post: #7
|
|||
|
|||
WebmasterWorld Removed From Google, MSN Search After It Banned Their Spiders
A related article on sites with a large number of pages having trouble with ill mannered spiders slowing their servers to a snail's pace:
Quote:After banning spiders from crawling its site last week, WebmasterWorld has been delisted from Google and MSN, and is sure enough to be delisted soon from Yahoo... full article: http://www.searchenginejournal.com/index.php?p=2560 Quote:WebmasterWorld head Brett Tabke decided to ban all search spiders including those from the major search engines in an effort to combat bandwidth loss and server sluggishness due to rogue spiders. Brett figured he had about 60 days until he'd see pages get dropped. It took two. full article: http://blog.searchenginewatch.com/blog/051123-093904 |
|||
11-24-2005, 06:53 PM,
Post: #8
|
|||
|
|||
Re: Banned for Being Badly Behaved Idiots...
When I read the title, I thought you were going to tell us YOU were banned from somewhere.
|
|||
11-26-2005, 10:29 AM,
Post: #9
|
|||
|
|||
Update on our Spider Problems and Google's Removal of WebmasterWorld
Updates:
A. We did not survive last night's invasion of the spiders. At one point we had 97 :blinkie: spiders simultaneously playing around on our directory/metasearch site last night. Any of you who visited that site or this site after 9 pm EST last night no doubt noticed the sites were unresponsive--now you know why. B. There's no doubt now that Google has responded to WebmasterWorld's banning of googlebot by doing a manual ban of the Internet's 279th busiest web site (as ranked by Alexa). Google has removed the site from its directory, all 2 million previously indexed site pages are now removed from Google Search, and WW now has a PR of 0 across all Google data centers. |
|||
« Next Oldest | Next Newest »
|
Users browsing this thread: 1 Guest(s)