Posted December 30th, 2007 by admin 2 Comments »
On a recent post we discussed about Robots Exclusion Protocol (REP) /robots.txt and how it is used to instruct search engines and other bots in what NOT to include while crawling our pages. /robots.txt enables us to block any page, directory or folder we don’t want appearing on the search pages. For the /robots.txt to work, it must be placed in the top-level directory of our web server.
What happens if we’re not hosting our own site and we don’t have access to the root directory of our domain? Now comes the REP META tag which we can manually place in our HTML files to control crawling, caching and snippets of our pages. The META tag is especially useful when we can get into individual pages of our site via the HTML but not the /robots.txt. Basically, if we don’t keep sensitive files and have no problem with search engines indexing all our pages, we can do away with placing /robots.txt file because the REP META tag works the same as the /robots.txt if not better in other ways because it offers flexibility in how we want our individual pages indexed by search engines thru HTML.
What does “robots” META tag look like and how does it work?
In structure, it is the same as other meta tags we can find when we open our template and view the HTML code. This is our meta tag for our description:
<meta name=”decription” content=”A Blog of Blogs That Follow Directory for Do Follow Bloggers. Link Submit site.” />
For the robots meta tag, it will be something like:
<meta name=”robots” content=”noindex,nofollow” />
where you’re instructing the robots to NOT index the page and NOT follow any links on the page.
This is how you can use the meta tag to suit your need.
Here are some actual examples of the robots META tag based on the above usage:
- This will NOT allow both indexing and following of links by a robot crawling on the page:
<meta name=”robots” content=”noindex,nofollow” />
OR you can also use this to get the same instruction of no indexing and no following of links:
<meta name=”robots” content=”none”>
- This will allow indexing of the page, but instructs the robot to ignore any links within the page:
<meta name=”robots” content=”index,nofollow” />
- This will not allow archiving of the page and instructs to ignore found links, but lets the robot find and get snippets of the page and allows indexing of the URL.
<meta name=”robots” content=”noarchive,nofollow,snippets,index” />
- This will not allow indexing of the page, but lets the robot find and follow the links within the page.
<meta name=”robots” content=”noindex,follow” />
Where do we put the robots META tag?
We place the meta tag just below the <head> which can be found in the topmost part of our HTML code or place it just before </head> part.
We reiterate as found on robotstxt.org site that:
“There are two important considerations when using the robots <META> tag:
- robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the NOFOLLOW directive only applies to links on this page. It’s entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.
Don’t confuse this NOFOLLOW with the rel="nofollow" link attribute.”
The rel=”nofollow” Attribute
The “rel=nofollow” is a Google invention, and is also supported by Yahoo and MSN, that is link specific and more to do with the ranking of a page rather than simply following or not following the links for indexing. It is somewhat confusing but they both have almost the same end result except that the rel=”nofollow”, once placed on a specific link means it will not get a ‘vote’ in the popularity ranking of a page. The search engine will disregard and ignore the link and it will not go to that link to index or follow that page.
That is why, we support the removal of rel=”nofollow” tag in our comments page because we value the readers who leave a comment on our posts. We want to make sure that Google, Yahoo and MSN find their link just as important as the site visited upon.
The rel=”nofollow” and robots meta tag does work in various instances, say we decided to write a story on “Evil Sites” and therefore can not help but include some sites that we find as evil. What we would do is put a robots meta tag for the entire post:
<META NAME=”ROBOTS” CONTENT=”INDEX, NOFOLLOW”>
which means that the search engines sure can index the post but instructs it to ignore and disregard all links found.
More particularly: we could include a rel=”nofollow” attribute on the HTML code for the specific link we don’t want to be associated with:
<a href=http://www.evilsite.com/ rel=”nofollow“>Evil Site</a>
<a rel=”nofollow” href=”http://www.evilsite.com/” >Evil Site</a>
The last thing we want is to tell Google, Yahoo and MSN that we approve this site and give a vote on it; that we want to be associated with the Evil Site!! In this instance, we find the rel=”nofollow” attribute to be helpful and useful.
Nonetheless, do we really want the same thinking to give to our readers leaving us a comment by still keeping the rel=”nofollow” tag on our comments page? That we’re telling them, thanks for commenting but we don’t want to be associated with you and you have no value to us… We don’t think so.
Why not give Do Follow a chance then and become a Do Follow supporter yourself? Please find our helpful topics on Do follow: