TulipTools Internet Business Owners and Online Sellers Community

Full Version: Attack of the Spiders
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Does anyone else have a problem with search engine spiders completely slowing their servers to a crawl when they're indexing your sites?

The TulipTools forums for now are sharing space on one of our dedicated servers with one of our directory sites which is why you might notice a slowdown at times when spiders are attacking (indexing) the directory site.

The directory site has huge databases (over 2 GB for the directory and over 1 GB for the metasearch) and when multiple spiders hit that site doing deep crawls it puts a huge load on the server. The normal sever load on the server ranges from 0.3-0.8, but when multiple spiders hit it jumps way way up: today MSN and LookSmart were indexing at the same time and the load average jumped to 18  :blinkie:, yesterday afternoon the load average was running 35 to 40  :blinkie: when MSN/Yahoo/Ask Jeeves all arrived near the same time and spent about 45 minutes indexing the site.

The spider related spikes in server load only occur  for short periods of time, but depending on the length of the crawl can last 45 minutes to an hour on rare occasions.

...soo, 1. is it time to stick the databases on a separate dedicated server and just keep the actual site and its programs/pages/etc on this dedicated server?  2. would converting the large databases from MySQL to PostGres maybe help? huh, huh, would it?  Big Grin
Never had that problem. . . but I'm small fries in comparison to you guys.

Just curious, is one of those spiders named "ginger"?  Wave
Thefinger
[quote author=Jen link=topic=162.msg593#msg593 date=1123787262]

Just curious, is one of those spiders named "ginger"?  Wave
[/quote]

No spiders named Ginger on the main site on this server.  :Smile

...on this site though, I found this in our stats:

Hits               Files                 KBytes             Visits              Hostname
22875 7.45% 16403 19.34% 33640 6.79% 96 5.75%    lilgingeebot
2677 0.87%   2669    3.15%      25351  5.12%      28  1.68%  crawl-66-249-65-180.googlebot.com


:blinkie: :blinkie: :blinkie: :blinkie: :blinkie:
Why does everyone pick on Ginger? Smileydancing
Quote: lilgingeebot

SHE gets her own spider AND smilie?!?!?!  :blinkie: Angryfire






Happy001
Quote:...soo, 1. is it time to stick the databases on a separate dedicated server and just keep the actual site and its programs/pages/etc on this dedicated server?

One. Signs007
Two.  wars have been started over the MySQL vs. PostGres question  Smileyfencing
Did you move the databases?    Toothy10
[quote author=jezebel link=topic=162.msg884#msg884 date=1124665984]
Did you move the databases?    Toothy10
[/quote]

Not yet.  :-[  but we have an excuse.  Big Grin  We just finished changing the backend of the metasearch engine (the spider) so its now a command line C program instead of being PHP based.  The part the user sees is still written in PHP.  We just put it into service early Saturday and now we're playing with the timing of the searches so it could be a while before we move the databases.  So far the move to C seems to be helping cut down on the server load...we're still getting hit hard when certain spiders visit (and decide they'd like to use the related search terms that are returned to index the whole friggin'  Internet  :Smile  Angryfire ) but not as hard as before.
Quote:SHE gets her own spider AND smilie?!?!?! 

8)
Pages: 1 2