Home
Home

Directory
Directory

Articles
Internet News
Security News
Ecommerce News
Domain News

Site Tools
Site Speed Test
Keyword Research
Resolve Hostname
DNS Tools
Register Domains
Affiliate Programs
Open Source

Shopping Carts
Cart Reviews
SSL Certificates

Enter your email address to subscribe to our updates:

Delivered by FeedBurner


Venue Charts
Channel Traffic Rankings
OAI Stock Quotes and Charts
eBay's Worst Feedback

Forum
Forum Home
TulipTools News
Advertising
Blogging
Computer Hardware
Domain Names
Ecommerce
Financing
Int'l Trading
Graphics and HTML
Internet Access
Legal Issues
Internet Business
Auction Sites
Classified Ad Sites
Fixed Price Venues
Operating Systems
Programming
Search Engines
Internet Security
Software
Web Hosting
Webmaster Issues
Reviews
Announcements
Off Topic Discussion

Web Hosting
TulipHosting

Domain Names
TulipDomains

Web Stats
TulipStats

Forum Rules
Forum Rules
Privacy Policy

Site Map
Forum Sitemap
Sitemap Topics




Directory| Forums| Internet News|Cart Reviews| DNS Tools| Keyword Research| Site Speed Test| Security| | Domain Marketplace| Domain Blog
TulipTools Internet Business Owners and Online Sellers Community
  • Home
  • Search
  • Member List
  • Calendar
Hello There, Guest! Login Register
TulipTools Internet Business Owners and Online Sellers Community › Webmaster Issues › Webmaster Issues › Apache Web Server v
« Previous 1 2

Banned for Being Badly Behaved Idiots...

  
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Threaded Mode | Linear Mode
Banned for Being Badly Behaved Idiots...
11-20-2005, 11:45 AM,
Post: #1
bargainbloodhound Offline
Lawnmower Mouth
********
Posts: 4,372
Likes Given: 0
Likes Received: 4 in 4 posts
Joined: Jul 2005
Reputation: 0
Banned for Being Badly Behaved Idiots...
We banned .the following annoying spiders from accessing the directory site that TulipTools shares a server with (and here's how you can ban them too):

RufusBot:  it obeys robots.txt so you can ban it by adding this to your robots.txt file:

Code:
User-agent: RufusBot
Disallow: /

Microsoft URL Control - 6.00.8862: doesn't obey robots.txt, can be banned if you have Apache mod rewrite enabled by adding this to your .htaccess file:

Code:
Options +FollowSymlinks
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control"
RewriteRule .* - [F,L]

EasyDL/3.04 -another one that doesn't obey robots.txt, ban it by adding the following to your .htaccess file:

Code:
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from d57-8-78.home.cgocable.net
Deny from 24.57.249.53
</Limit>

These are all Spam Harvesting Bots:  ban them all by adding the following to your .htaccess file:

Code:
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>




"Well, Jay was so giddy that someone named Jay was involved with this site we posted our first non-eBay listing in 3 years here at Lunarbid (we tried two items at Yahoo once upon a time, they bombed)" -Marie posting in a LunarBid thread at OTWA in 2005 wins the award for 'most moronic reason ever given for choosing a venue"

"thanks twat u must have nothing better 2 do. do u talk to all your members like that. will not be recomending your site.
best way to put it is TULIPTOOLS.COM IS REALLY SHIT. DONT JOIN." -pubescent owner of rinky dink off2auction.com in 2011
Like Post Reply
[+]
11-20-2005, 01:25 PM,
Post: #2
bargainbloodhound Offline
Lawnmower Mouth
********
Posts: 4,372
Likes Given: 0
Likes Received: 4 in 4 posts
Joined: Jul 2005
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
Add some more spidering idiots to the list  Smile but these we can't ban.

We turned off the Apache web server for a few minutes a little while ago A. just to annoy TulipTools users and snicker at the thought of all of you getting the white 'can't connect' error messages and B. so we could stop our log files and see who exactly (as in which idiot spiders) have been arriving en masse the past few days in the early morning and again in the late afternoon and causing our server to slow to a crawl or be inaccessible for short periods of time (because they're hitting the MYSQL database with 250-300 simultaneous requests)

The winners of the Spiderboinktard Award are....Yahoo Slurp and Yahoo Slurp China are both arriving (with lots of little children) at the same time...and about 5 minutes after they arrive LookSmart's WiseNutJob shows up with a few friends to join the party...grrrrr...and none of them appear to be obeying the rules

Tongue2
"Well, Jay was so giddy that someone named Jay was involved with this site we posted our first non-eBay listing in 3 years here at Lunarbid (we tried two items at Yahoo once upon a time, they bombed)" -Marie posting in a LunarBid thread at OTWA in 2005 wins the award for 'most moronic reason ever given for choosing a venue"

"thanks twat u must have nothing better 2 do. do u talk to all your members like that. will not be recomending your site.
best way to put it is TULIPTOOLS.COM IS REALLY SHIT. DONT JOIN." -pubescent owner of rinky dink off2auction.com in 2011
Like Post Reply
[+]
11-21-2005, 10:59 PM,
Post: #3
rose Offline
Big Member
*****
Posts: 465
Likes Given: 0
Likes Received: 0 in 0 posts
Joined: Jul 2005
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
:Smile

http://help.yahoo.com/help/us/ysearch/sl...rp-03.html

How can I reduce the number of requests you make on my web site?

There is a Yahoo! Slurp-specific extension to robots.txt which allows you to set a lower limit on our crawler request rate.

You can add a "Crawl-delay: xx" instruction, where "xx" is the a delay in seconds between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 5 or 20 or a comfortable value for your server.

Setting a crawl-delay of 20 seconds for Yahoo! Slurp would look something like:


   
Code:
User-agent: Slurp
Crawl-delay: 20

http://www.gentoo.org/
Like Post Reply
[+]
11-22-2005, 09:40 PM,
Post: #4
mandy Offline
Administrator
*******
Posts: 9,932
Likes Given: 0
Likes Received: 6 in 5 posts
Joined: Feb 2011
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
rose Wrote:There is a Yahoo! Slurp-specific extension to robots.txt which allows you to set a lower limit on our crawler request rate.

We added the Crawl-delay for both Slurp and MSNbot Sunday night.  Started out at 5 seconds for Slurp, didn't work, upped it to 10 still didn't work, and increased it to 15.  We survived the heavy morning Slurp crawl unscathed today.

We didn't make it through last night though.  We restarted Apache 5 times last evening to shake the spiders.  Slurp, Slurp China, MSNbot, Googlebot, WiseNutBot, Lycos, Dir.com, and a few others all arrived at once.  40 spiders unleashed in our database using the cache to do metasearches on the net. Hundreds of searches a minute, thousands of database queries a minute  :Smile  On the bright side 112,000 pages of that site are indexed in Yahoo now and the number is increasing daily.
Like Post Reply
[+]
11-23-2005, 11:20 AM,
Post: #5
mandy Offline
Administrator
*******
Posts: 9,932
Likes Given: 0
Likes Received: 6 in 5 posts
Joined: Feb 2011
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
We survived the spiders last night.   

A Yahoo search for site:----.com now yields 126,000 results, a search for ----.com yields 487,000 results, and a search for link: www.----.com yields 128,000 results.  I'm happy.  Smile

TulipTools is another story: site: tt.com 227, community.tt.com 271, and link:  tt.com 5,310 .   Search Engines have been slow to index the forums on any site where we've used this forum software.

The Crawl-delay for MSNbot.  Delay times can be 5 to 120 seconds.  We're using 40 on the search site.

   
Code:
    User-agent: msnbot
    Crawl-delay: 120


Like Post Reply
[+]
11-23-2005, 11:51 PM,
Post: #6
rose Offline
Big Member
*****
Posts: 465
Likes Given: 0
Likes Received: 0 in 0 posts
Joined: Jul 2005
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
Quote:We survived the spiders last night.

Thumbsup
http://www.gentoo.org/
Like Post Reply
[+]
11-24-2005, 10:48 AM, (This post was last modified: 11-24-2005, 11:50 AM by mandy.)
Post: #7
mandy Offline
Administrator
*******
Posts: 9,932
Likes Given: 0
Likes Received: 6 in 5 posts
Joined: Feb 2011
Reputation: 0
WebmasterWorld Removed From Google, MSN Search After It Banned Their Spiders
A related article on sites with a large number of pages having trouble with ill mannered spiders slowing their servers to a snail's pace:

Quote:After banning spiders from crawling its site last week, WebmasterWorld has been delisted from Google and MSN, and is sure enough to be delisted soon from Yahoo...

Brett Tabke of WebmasterWorld explains “We have pushed the limits of page delivery, banning, ip based, agent based, and down right cloaking to avoid the rogue bots - but it is becoming an increasingly difficult problem to control.”

full article: http://www.searchenginejournal.com/index.php?p=2560

Quote:WebmasterWorld head Brett Tabke decided to ban all search spiders including those from the major search engines in an effort to combat bandwidth loss and server sluggishness due to rogue spiders. Brett figured he had about 60 days until he'd see pages get dropped. It took two.

As of this moment, site:webmasterworld.com at Google shows NO pages being listed from the site. Prior to the ban, about 2 million pages were listed.

... this is indicative of Google manually pulling everything about the site from Google.

full article: http://blog.searchenginewatch.com/blog/051123-093904
Like Post Reply
[+]
11-24-2005, 06:53 PM,
Post: #8
Anita Offline
Tulip Fanatic
*******
Posts: 2,157
Likes Given: 0
Likes Received: 0 in 0 posts
Joined: Jul 2005
Reputation: 0
Re: Banned for Being Badly Behaved Idiots...
When I read the title, I thought you were going to tell us YOU were banned from somewhere.  Wink
Like Post Reply
[+]
11-26-2005, 10:29 AM,
Post: #9
mandy Offline
Administrator
*******
Posts: 9,932
Likes Given: 0
Likes Received: 6 in 5 posts
Joined: Feb 2011
Reputation: 0
Update on our Spider Problems and Google's Removal of WebmasterWorld
Updates:

A.  We did not survive last night's invasion of the spiders.  At one point we had 97  :blinkie: spiders simultaneously playing around on our directory/metasearch site last night. Any of you who visited that site or this site after 9 pm EST last night no doubt noticed the sites were unresponsive--now you know why. Angryfire

B.  There's no doubt now that Google has responded to WebmasterWorld's banning of googlebot by doing a manual ban of the Internet's 279th busiest web site (as ranked by Alexa).  Google has removed the site from its directory, all 2 million previously indexed site pages are now removed from Google Search, and WW now has a PR of 0 across all Google data centers.
Like Post Reply
[+]
« Next Oldest | Next Newest »


  • View a Printable Version
  • Send this Thread to a Friend
  • Subscribe to this thread
Forum Jump:


Users browsing this thread: 1 Guest(s)
  • Contact Us
  • TulipTools Internet Business Owners and Online Sellers Community
  • Return to Top
  • Lite (Archive) Mode
  • RSS Syndication
  • Help
Current time: 05-18-2026, 03:41 PM Powered By MyBB, © 2002-2026 MyBB Group. Theme created by Justin S.
powered by Apache

powered by Linuxpowered by CentOS

Copyright 2000-2013 TulipTools.com. All rights reserved.