As an effort to help search engines index my site better, I installed a robots.txt file. The reason why I did this will become clear in a minute. When you run a blog, search engines will parse your site and index it with a lot of duplicate data and posts. Duplicate data and posts can have a negative effect whereas the spider might think that your blog is a spam blog (splog). In the long run this can cause lower rankings.
Why the spiders think that way is pretty logically, at least how I think they work! If you write a blog post, and its current, it will be on your main index page (1 copy). At least two more copies will reside in your category and archive sections. Essentially you’ll have three copies of the same post if the spider parses your site that day!
As soon as your post drops off the front page it will end up in your category and archive sections; you’ll still have two copies of the same post! The best solution is to tell the spider NOT to index one of those sections! It’s that simple!
I found a version of an optimized Wordpress robots.txt file and discussion here. I then modified it to my needs and installed it in the root directory. So far it seems to be working but only time will tell how effective it will be. From what I’ve read, using a robots.txt file is a long running experiment and you have to tweak it over time to maximize its effectiveness.
Note to myself, flag this post for update.
From around the Social Web!
Want to leave a comment?
If you want to give me some feedback on this post, please contact me
via email or on Twitter