Log in

View Full Version : [TUT] Removing Robots from your site! [/TUT]



Lee
22-03-2008, 05:39 PM
If you own a forum/fansite and see that robots are visiting your site, making it look un-proffesinal on the "Whos Online" page seeing "Google Bot" ect. Well this guide will show you how to limit or remove them;

To prevent all robots from indexing a page on your site, you'd place the following meta tag into the <HEAD> section of your page:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

To allow other robots to index the page on your site, preventing only Google's robots from indexing the page, you'd use the following tag:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">

To allow robots to index the page on your site but instruct them not to index images on that page, you'd use the following tag:
<META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">

Decode
22-03-2008, 05:41 PM
Good Tut :)

Also you missed the "> at the end of the 1st meta tag

Lee
22-03-2008, 05:43 PM
Thanks, edited :)

*REMOVED*

Edited by nvrspk4 (Assistant General Manager): Please do not ask for rep.

Hypertext
22-03-2008, 06:32 PM
Don't use caps, and this won't stop bad bots.

L?KE
22-03-2008, 06:44 PM
Don't use caps as previously said, and maybe use xhtml as that is more commonly used now me thinks

eg:



<meta name="robots" content="noindex, nofollow" />

Tomm
22-03-2008, 10:24 PM
Not all search engines use meta tags, you are best off using robots.txt if you really want to block crawlers. But why would you want to block them anyway? Without them you site would never get indexed on search engines...

Independent
23-03-2008, 05:18 AM
Not all search engines use meta tags, you are best off using robots.txt if you really want to block crawlers. But why would you want to block them anyway? Without them you site would never get indexed on search engines...

Short robots.txt guide (I just wrote up);

(Sulakes below)

User-agent: *
Disallow: /gallery/image_bank/
Disallow: /careers/apply
Disallow: /contact/index.html

e.g:

User-agent: *Disallow: /calon/index.phpPut the file URL e.g: calon/index.php (if in folder named ''calon'')

Disallow: /calon/index.php

And also use 'User-agent: *' at the top of your robots.txt file Enjoy! (:

Want to hide these adverts? Register an account for free!