Implementing a Robots.txt File With Your vBulletin Community
A community owner always wants to increase exposure to his or her site. It goes without saying then that Search Engine Optimization (SEO) is one of the most important ways a webmaster can draw in traffic. SEO can take literally thousands of forms - it in itself is an art, and there are many different ways a site can be optimized for better indexing by Google and other popular engines. We'll be exploring just one of these steps in this article; utilizing a robots.txt file.
Google and the other major search engines continually "crawl" websites throughout the web and analyze the content at the site using a bot (i.e. Googlebot). Depending on a lot of different factors, this "crawl" will affect your site Page Rank (PR) and the order of its listings in Search Engine Result Pages (SERPs) when a user types in a keyword into the engine.
You want your site to be optimized for the best possible crawl by search engines, which can be done through the robots.txt file. This file ensures that your site will display the best content for the engine to view and saves bandwidth by limiting the bots to only "worthwhile" areas of your site.
A robots.txt file is very beneficial to any site, but we are going to explore how it relates to vBulletin community (forum) sites. If you are not using vBulletin you can still learn from this article, though you will have to tweak your own robots.txt file.
One thing to keep in mind:
While the robots.txt file can be used to "hide" content from search engines, remember that a search engine bot views your community as a Guest. This means any private forums, moderator forums, and private messages will not be viewed by the bot. Therefore our robots.txt file will tell bots to both a) ignore functions it is not capable of performing as a Guest and b) ignore areas of the forum with no content.
STEP 1: What is the robots.txt file?
As mentioned earlier, the robots.txt file will optimize how a search engine bot crawls your site (hence the name robots.txt). It is easily made with Notepad or any text editor capable of saving a simple text file.
Robots.txt should be placed in the ROOT directory of any website. You can check if a site has a robots.txt by going to websitename.com/robots.txt (i.e. www.relationshipsandmore.com/robots.txt has one, while www.myspace.com/robots.txt does not)
STEP 2: Understanding a robots.txt file
Although a robots.txt can be made simply, we are going to explore what the text file consists of.
The robots.txt file consists of two main parts.
1) The UserAgent
- The UserAgent line specifies the robot. You can specificy a single Search Engine Bot you'd like to follow your Disallow rules by typing in, for example, "googlebot." But for the most part you are going to want ANY bot that spiders your site to follow the disallow rules, and therefore you would put an asterick (*) as a wild card.
- While you may use the "Allow" command, by default the bot will crawl any directory (visible to a Guest), and therefore you will only want to specify what areas of your community it CAN'T and SHOULDN'T crawl. (Some bots completely ignore any "Allow" commands, and it is not part of the universal standard yet. Further reason one should only use "Disallow"
This allows all bots to crawl your site
This prevents any bot from crawling your site.
STEP 3: Structuring a robots.txt file
How can you make the best possible robots.txt file for your vBulletin community? After much testing, this appears to be the best possible combination for optimizing crawls and reducing bandwidth:
The above can simply be copy and pasted into Notepad, saved as robots.txt and uploaded to your community root directory via FTP.Code:User-agent: * Disallow: /admincp/ Disallow: /announcement.php Disallow: /calendar.php Disallow: /cron.php Disallow: /editpost.php Disallow: /faq.php Disallow: /joinrequests.php Disallow: /login.php Disallow: /member.php Disallow: /misc.php Disallow: /modcp/ Disallow: /moderator.php Disallow: /newreply.php Disallow: /newthread.php Disallow: /online.php Disallow: /printthread.php Disallow: /private.php Disallow: /profile.php Disallow: /register.php Disallow: /search.php Disallow: /sendmessage.php Disallow: /showgroups.php Disallow: /showpost.php Disallow: /subscription.php Disallow: /subscriptions.php Disallow: /threadrate.php Disallow: /usercp.php
If you're simply looking for a quick SEO solution, you can stop after you upload the robots.txt file. If you want to understand (a) why the robots.txt is structured in this way and (b) why those certain directories were disallowed, then continue on...
STEP 4: Deciphering the robots.txt file
The more minimal you can keep your robots.txt file, the better it is for bots. Notice how there are no line spaces between the Disallows and the Useragent. This is important. Also it is possible to use "notes" in your robots.txt, for example:
#stops bot from crawling the "Send PM"
#stops bot from crawling the "register" page
But it is not recommended. A big mistake some people have made is putting comments on the same line. If you already have a robots.txt make sure it NOT set up like this (if it has comments):
Disallow: /sendmessage.php #stops bot from crawling the "Send PM" page
Disallow: /register.php #stops bot from crawling the "register" page
It is best to remove comments altogether, but if necessary, make sure it is done properly.
STEP 5: So, what did you disallow?
The above robots.txt file is optimized for any vBulletin community. I see no point in listing why ALL of those directories are being disallowed by the robots.txt, but I will outline a few of them:
/sendmessage.php - nothing to crawl in the "Contact Us" form (i.e. sending comments to administrator)
/printthread.php - bot has already crawled your thread, no need for it to also crawl your "printable" version
/admincp - Can't access, so "don't waste time/resources"
/modcp - Can't access, so "don't waste time/resources"
/faq.php - Nothing much it can crawl in your FAQ
/online.php - Why look at Who's Viewing your site? Again, no content.
ENDING BUSINESS: Checking for Common Mistakes
Hopefully you've used the above robots.txt file for your vBulletin community. But here are some common things to check for if you already have one for your community.
1. Is it working? www.yoursite.com/robots.txt - You must be able to access it this way
2. Don't get your Useragent and Disallow lines mixed up.
3. Do NOT put multiple directories on one page:
Disallow: /threadrate.php /usercp.php (BAD!!!)
4. Always edit your robots.txt in UNIX mode and upload in ASCII. Many FTP clients will make the transformation to Unix line enders for you, some will not.
5. It is better practice to have your #Comments on a line by themselves
6. Capitalization! Do not do it in all capitals! Useragent and Disallow are not case-sensitive, but DIRECTORY names are:
DISALLOW: /JoinRequests.PHP (BAD!!!)
Disallow: /joinrequests.php (Good!)
Hopefully this article gave you a little insight into robots.txt and its role in SEO. For those of you already familiar with robots.txt I hope you learned something new. SEO is interesting and fun and taking advantage of any way to improve your results in the Search Engines is a way to improve your site .
Best of Luck!