Is there a way of defining real users to crawlers? I dont want to change what it sees just log that stat down differently in mysql.
Could you tell me any functions to define this or a way to do it
Thanks guys

Is there a way of defining real users to crawlers? I dont want to change what it sees just log that stat down differently in mysql.
Could you tell me any functions to define this or a way to do it
Thanks guys
Find out the user agent names of the bots, and then just log them differently if it finds that the user agent of the visitor equals one of the user agents in an array of crawler ones.
Alternatively, I'd suggest you get some bot IPs and do the same with them.
i used to be NintendoNews. visit my blog or add me on twitter.
need help with vista? i am a microsoft certified technology specialist in configuring windows vista and connected home integrator.. pm me for help!
"I am the way, the truth, and the life. No one comes to the Father except through me"
John 14:6 (NIV)
Thats what I was thinking of doing, as a last resort as it wont be pick less know crawlers. Anyway if thats the only way to do it Ill work on it ;')
You could just (if I'm on the right track) use javascript, and use document.write to place a dynamic image (ie: image.php) which just places a 1x1 transparent image wherever, but the script itself does whatever you wanted it to.
From my knowledge webcrawlers don't have javascript enabled.
XHTML, CSS, AJAX, JS, php, MySQL.
--
HxF moderators can't read timestamps.
How do I get the users or servers dns/host information? IP seems to be a task due to the number of IP they use
google uses loads
209.185.108, 209.185.253, 216.239.33.96, 216.239.33.97, 216.239.33.98, 216.239.33.99, 216.239.37.98, 216.239.37.99, 216.239.39.98, 216.239.39.99, 216.239.41.96, 216.239.41.97, 216.239.41.98, 216.239.41.99, 216.239.45.4, 216.239.46, 216.239.51.96, 216.239.51.97, 216.239.51.98, 216.239.51.99, 216.239.53.98, 216.239.53.99, 216.239.57.96, 216.239.57.97, 216.239.57.98, 216.239.57.99, 216.239.59.98, 216.239.59.99, 216.33.229.163, 64.233.173.193, 64.233.173.194, 64.233.173.195, 64.233.173.196, 64.233.173.197, 64.233.173.198, 64.233.173.199, 64.233.173.200, 64.233.173.201, 64.233.173.202, 64.233.173.203, 64.233.173.204, 64.233.173.205, 64.233.173.206, 64.233.173.207, 64.233.173.208, 64.233.173.209, 64.233.173.210, 64.233.173.211, 64.233.173.212, 64.233.173.213, 64.233.173.214, 64.233.173.215, 64.233.173.216, 64.233.173.217, 64.233.173.218, 64.233.173.219, 64.233.173.220, 64.233.173.221, 64.233.173.222, 64.233.173.223, 64.233.173.224, 64.233.173.225, 64.233.173.226, 64.233.173.227, 64.233.173.228, 64.233.173.229, 64.233.173.230, 64.233.173.231, 64.233.173.232, 64.233.173.233, 64.233.173.234, 64.233.173.235, 64.233.173.236, 64.233.173.237, 64.233.173.238, 64.233.173.239, 64.233.173.240, 64.233.173.241, 64.233.173.242, 64.233.173.243, 64.233.173.244, 64.233.173.245, 64.233.173.246, 64.233.173.247, 64.233.173.248, 64.233.173.249, 64.233.173.250, 64.233.173.251, 64.233.173.252, 64.233.173.253, 64.233.173.254, 64.233.173.255, 64.68.80, 64.68.81, 64.68.82, 64.68.83, 64.68.84, 64.68.85, 64.68.86, 64.68.87, 64.68.88, 64.68.89, 64.68.90.1, 64.68.90.10, 64.68.90.11, 64.68.90.12, 64.68.90.129, 64.68.90.13, 64.68.90.130, 64.68.90.131, 64.68.90.132, 64.68.90.133, 64.68.90.134, 64.68.90.135, 64.68.90.136, 64.68.90.137, 64.68.90.138, 64.68.90.139, 64.68.90.14, 64.68.90.140, 64.68.90.141, 64.68.90.142, 64.68.90.143, 64.68.90.144, 64.68.90.145, 64.68.90.146, 64.68.90.147, 64.68.90.148, 64.68.90.149, 64.68.90.15, 64.68.90.150, 64.68.90.151, 64.68.90.152, 64.68.90.153, 64.68.90.154, 64.68.90.155, 64.68.90.156, 64.68.90.157, 64.68.90.158, 64.68.90.159, 64.68.90.16, 64.68.90.160, 64.68.90.161, 64.68.90.162, 64.68.90.163, 64.68.90.164, 64.68.90.165, 64.68.90.166, 64.68.90.167, 64.68.90.168, 64.68.90.169, 64.68.90.17, 64.68.90.170, 64.68.90.171, 64.68.90.172, 64.68.90.173, 64.68.90.174, 64.68.90.175, 64.68.90.176, 64.68.90.177, 64.68.90.178, 64.68.90.179, 64.68.90.18, 64.68.90.180, 64.68.90.181, 64.68.90.182, 64.68.90.183, 64.68.90.184, 64.68.90.185, 64.68.90.186, 64.68.90.187, 64.68.90.188, 64.68.90.189, 64.68.90.19, 64.68.90.190, 64.68.90.191, 64.68.90.192, 64.68.90.193, 64.68.90.194, 64.68.90.195, 64.68.90.196, 64.68.90.197, 64.68.90.198, 64.68.90.199, 64.68.90.2, 64.68.90.20, 64.68.90.200, 64.68.90.201, 64.68.90.202, 64.68.90.203, 64.68.90.204, 64.68.90.205, 64.68.90.206, 64.68.90.207, 64.68.90.208, 64.68.90.21, 64.68.90.22, 64.68.90.23, 64.68.90.24, 64.68.90.25, 64.68.90.26, 64.68.90.27, 64.68.90.28, 64.68.90.29, 64.68.90.3, 64.68.90.30, 64.68.90.31, 64.68.90.32, 64.68.90.33, 64.68.90.34, 64.68.90.35, 64.68.90.36, 64.68.90.37, 64.68.90.38, 64.68.90.39, 64.68.90.4, 64.68.90.40, 64.68.90.41, 64.68.90.42, 64.68.90.43, 64.68.90.44, 64.68.90.45, 64.68.90.46, 64.68.90.47, 64.68.90.48, 64.68.90.49, 64.68.90.5, 64.68.90.50, 64.68.90.51, 64.68.90.52, 64.68.90.53, 64.68.90.54, 64.68.90.55, 64.68.90.56, 64.68.90.57, 64.68.90.58, 64.68.90.59, 64.68.90.6, 64.68.90.60, 64.68.90.61, 64.68.90.62, 64.68.90.63, 64.68.90.64, 64.68.90.65, 64.68.90.66, 64.68.90.67, 64.68.90.68, 64.68.90.69, 64.68.90.7, 64.68.90.70, 64.68.90.71, 64.68.90.72, 64.68.90.73, 64.68.90.74, 64.68.90.75, 64.68.90.76, 64.68.90.77, 64.68.90.78, 64.68.90.79, 64.68.90.8, 64.68.90.80, 64.68.90.9, 64.68.91, 64.68.92, 66.249.64, 66.249.65, 66.249.66, 66.249.67, 66.249.68, 66.249.69, 66.249.70, 66.249.71, 66.249.72, 66.249.78, 66.249.79, 72.14.199, 8.6.48
edit:
Ill do the agent
Last edited by Luckyrare; 01-03-2007 at 08:44 PM.
The useragents the trick, basicaly the useragent of a bot will be substantaly differnt from that of a browser, you dont nessary need to have a list of bots out there ether, simply checking from an ocurance of the word bot in the browser string will usealy turn em up, id guess theres probably a number of other differnces in it, if you compaired a few with that of normal browser, browser strings "/
Ok thanks so something like this would do the trick, havent finished it yet. I am pretty sure I will have to list all of them as they all are in different formatsThe useragents the trick, basicaly the useragent of a bot will be substantaly differnt from that of a browser, you dont nessary need to have a list of bots out there ether, simply checking from an ocurance of the word bot in the browser string will usealy turn em up, id guess theres probably a number of other differnces in it, if you compaired a few with that of normal browser, browser strings "/
http://willmacc.wordpress.com/bot-ips/
PHP Code:$agent = $_SERVER['HTTP_USER_AGENT'];
if (eregi('google', $agent) | eregi('gsa-crawler', $agent)) {
// Googlebot
}
elseif (eregi('search.msn.com', $agent) | eregi('MS Search 4.0 Robot', $agent)) {
// MSNBot
}
elseif (eregi('ZyBorg/1.0', $agent)) {
// WISEnut
}
elseif (eregi('Scooter/3.3Y!CrawlX', $agent)) {
// Alta Vista
}
elseif (eregi('Ask Jeeves/Teoma', $agent) | eregi('AskJeeves/Teoma', $agent) | eregi('teoma_agent1', $agent) | eregi('ask.com', $agent) ) {
// Ask Jeeves/Teoma
}
elseif (eregi('Lycos_Spider_(modspider)', $agent)) {
// Ask Jeeves/Teoma
}
elseif (eregi('ia_archiver', $agent)) {
// Alexa
}
Humm true, use a DB to help u out?
http://www.robotstxt.org/wc/active/html/index.html
http://www.siteware.ch/webresources/useragents/spiders/
http://www.pgts.com.au/pgtsj/pgtsj0208d.html
etc?
Nice, Ill build up a list tomorrow. Thanks for your helpHumm true, use a DB to help u out?
http://www.robotstxt.org/wc/active/html/index.html
http://www.siteware.ch/webresources/useragents/spiders/
http://www.pgts.com.au/pgtsj/pgtsj0208d.html
etc?
why not just is something likeor something?PHP Code:if (eregi('Mozilla/5.0', $agent)) {
// WISEnut
}
Want to hide these adverts? Register an account for free!