How to Speak Robot [s.txt] that is.
If you're new here, you may want to subscribe to my RSS feed.

Robbie the robots.txt robot
Last week’s Tech Tuesday tip featured a bunch of Google’s advanced search tools and astute reader Nneka rightly pointed out, that if I was going to show folks how easy it is for potential hackers to search for the login page for their WordPress blog, I could at least have the decency to show them how to keep your login page from getting into the list in the first place. I agree completely so here it is…
This article is about the robots.txt file and how to use it to control which parts of your blog or website get indexed and which parts don’t.
What’s a robot?
Is a simple piece of software that follows links and indexes web page. Most commonly used by search engines to find and keep track of web pages. The most famous search engine robot is Googlebot.
What is a robots.txt file?
A simple text file, robots.txt was created as a way to indicate to indexing robots which files that you wanted to have indexed and which files you didn’t want to have cataloged in a search engine.
They are also used to control which robots you allow to crawl your site and which ones you want to turn away.
The rub?
Obeying the robots.txt file is voluntary on behalf of the robot. So bad robots can just ignore the robots.txt file and waste you bandwidth and catalog your site anyways.
Why would I want to use a robots.txt?
Primarily, you can use it to keep certain files out of the indexes of the most popular search engines. This brings us back to the search that I demonstrated last week of how to search for a file segment in Google.
If you run this search you’ll get a list of the login page for 2.3 million WordPress blogs. All are not using the robots.txt file, or not using it correctly…because there is no good reason to have your login page in the Google index.
What they should be doing is having a simple text file telling, see mine here, that disallows indexing of that page and the other private sections of your site.
How to use it?
It is simple, if you have a WordPress blog, you can just take the text from mine, paste it into a file called robots.txt and put it in the root [the same place that has your wp-login.php file] and that’s it.
Is there more to robots and robots.txt than that?
Yes, a lot. Start here to get more info. And then there are more resources here…like a list of robots and more details about how to use the robots.txt file.
There are also robot meta tags that you can use in a similar manner to the robots.txt file. Create a robots meta tag on the fly with this tool.
Does the robots.txt protect your site…make it more secure?
No, don’t confuse this file for security. It provides none. The best it can do is prevent your site’s pages from showing up in the search engine results.
How to remove your pages from Google’s index?
If your page has been indexed and you’d like to get it out of Google’s search results you can find out to to remove items from Google.
Final word of caution.
Be careful. The worst thing that can happen with a robots.txt file is that you screw it up and Google takes the most profitable section of your site out of the index. It isn’t really difficult but use robots.txt at your own risk.
Jon Symons
Risking having my site removed from Google, so you don’t have to.


Thanks for the mention.
Have you played with preventing access to your admin folders using .htaccess or password protection on the directory?
I ask because I tried to and for some reason after I password protect the folder I can’t get to it at all (the password prompt does not show up). Any help there?
Hehe:-)
Thanks Jon,
This is always something I’ve been meaning to learn more about, but never had the motivation to go looking for.
Nneka,
Thank you for the story idea. I haven’t tried to password protect the wp-admin folder, but I have done folders before on non-WordPress sites and didn’t have a problem. What control panel are you using? Maybe your hosting company doesn’t have that feature configured properly…it can be a bit tricky, but with enough fiddling an usually, last resort, reading the help files
I’ve been able to get it working.
We’ll see if that turns into next week’s tech tip…it’s a good idea!
Dominic, no problem, glad to help out.
Jon,
I love the “Doing x, y, z, so you don’t have to” tagline. Gets me grinnng everytime - keep it up!
Thanks David, they are a blatant rip-off of Ze Frank, who, for a while, was ending his videos by saying:
“this is Ze Frank, thinking, so you don’t have to”
I thought it was so cool I couldn’t help stealing it [the most sincere form of flattery, after all]. Glad to see the humor comes through, some times it’s tough to tell if people are getting my jokes.
I think it’s this version of WP.
I’m using CPanel. I’ve protected other directories with no problem on OSCommerce, WP, and generically. I’m not willing to toy with the version since it’s the latest.
It could also be something upgraded on the server (more likely). It was very frustrating. Just wanted to know if anyone else tried it with 2.0.4
Nneke…I have to confess, I’m still running 2.0.1 on here.
I want to move it to a different hosting account so I’ve been procrastinating on the upgrade, so I can do both at the same time.