Search Engine Robots - SEO Fever logo

SEO Fever
Search Engine Optimization
for the World Wide
Web

Everything you've always wanted to know about
search engine optimization, but were afraid to ask.

 
 
Home >> SEO Knowledge Base >> Robots, Spiders and Crawlers - oh my!
 

SEO Fever

SEO News

SEO Tools

 

 

Robots, Spiders and Crawlers -
Oh My!

What is a search engine robot?

Yes, robots are everywhere. Search engine robots, spiders, crawlers and bots that is. Whatever name they go by there is really nothing to be frightened of. Actually these robots are programs that a search engine will send out to surf web pages. The robots 'spider' through web pages they find to assist the crawler based engines index the vast amount of pages that are on the world wide web. Each search engine has their own robot, sometimes more than one and they all do things a bit differently. This article does not focus on any particular robot or search engine but rather gives you a general overview of robots, spiders and crawlers.

How search engine robots work

In it's essence the robot goes out and visits Site A - it reads the metadata and the entire text on the site. And, I mean text! Not images, not text rendered as an image - just plain old text. The robot also reads the URL's of any outbound links pointing to Site B and Site C. The robot will then spider Site B and Site C as well. As you can see the method of having a good linking strategy will prove to be very important.

The pages of Site A, B and C are then stored in the search engines database where it will be indexed. So when you do a search on Google for instance, you aren't searching all the documents on the Internet. You are only looking at the pages that Google has in it's index. You can experiment with this by doing the same search in different engines - you'll get varying results. For more information on how each major search engine builds their index see the article "Who powers whom?".

Ah, but you may ask yourself - how does the search engine ever find out about my site? As stated before, there are just too many pages out there. The robot will only crawl the pages it knows about. So you need to let the search engine know your site is out there. But how? First, use a robots.txt file. This file is loaded into the root directory of the site and search engines look for it. It tells the robot what parts of your site you want spidered or not. See the robots.txt file (feel free to use it) for this site: this is our Robots Exclusion Protocol.

As mentioned earlier you can build up your link popularity. This is actually very effective although it can be a lot of work. You need to get good quality sites with relevant content to put a link on their site pointing back to yours - the more the better. This will almost always require you to reciprocate by linking your site to theirs (to check your link popularity try Marketleap's Link Popularity Tool).

The other way to get the search engines to crawl your site is by adding your URL to the search engine's index. You can go to each engine individually and fill out a form so that your URL will be indexed the next time the robot goes out. Not all of the engines allow this type of submission and some have dropped the service. Also, read the fine print for each search engine's submission rules. Just because you submitted doesn't mean you will be indexed. Again you can see how important a good linking strategy will be because the spiders will find you anyway without you submitting to the engines themselves.

Identifying search engine robots and spiders

When a search engine robot is sent to your site it will be logged in your site's log file (statistics) the same way users are. You can check your log files to see if any of these critters crawled your site. You can then use this information to analyze whether or not a robot has been to crawl your site. The following was pulled from the log files for this site:

[04/May/2005:02:12:43 -0400] "GET /robots.txt HTTP/1.0" 200 68 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.71.1 - -

Now I know Google has been here. Some of the robots are quite easy to identify (as above). See that name in your log files and then you know the Google robot has been to your site for a visit. Keep in mind that search engines can have more than one robot and you may see some that you can not recognize (the engine simply may be using a new name for their robot); there are a lot of search engines out there using robots and the technology changes daily.

The search engine robots have visited, what next?

OK, so you've got your linking strategy happening and/or you've made manual submissions to individual search engines now what? When the search engines index the sites they will then have programs (algorithms) that rank the site. This is where good search engine optimization comes in to play. You want your site to appear in the Top 10 of the 200,000 sites that an engine has displayed in the search results for the keyword "ballet shoes" (that's of course if you are selling ballet shoes!). Please read "What is SEO?" for more information on optimizing your site for the major search engines.

The spider will be back to crawl your site soon as the engines send them out whenever that is (the engines vary in their scheduling). Make sure you keep your site updated and add new content as often as possible (a page a week if you can manage) and make sure it is keyword rich. Some search engines do have a limit on the number of pages they will crawl on any one site so optimize well. Also, last but not least, be patient. This is an iterative process and does take time before you may see any results. If done properly you can watch your site climb up the rankings to be placed high in the search engines results pages (SERP's).

Helpful links about robots:
Checklist for Search Robot Crawling and Indexing
SpiderHunter.com
Includes tutorial on cloaking scripts and how to track spiders from search engines.
Botspot
A Bot monitor site, with regular updates and links to the bot's home pages.

6 May 2005

   
SEO - Blue-gray spacer
 
SEO Fever

Home | KnowledgeBase | SEO Glossary | Site Map

For more information please contact us.