Discover Habbo's history
Treat yourself with a Secret Santa gift.... of a random Wiki page for you to start exploring Habbo's history!
Happy holidays!
Celebrate with us at Habbox on the hotel, on our Forum and right here!
Join Habbox!
One of us! One of us! Click here to see the roles you could take as part of the Habbox community!


Page 1 of 2 12 LastLast
Results 1 to 10 of 11
  1. #1
    Join Date
    Mar 2005
    Location
    Leeds
    Posts
    3,423
    Tokens
    0

    Latest Awards:

    Default PHP: Offline browsers [Crawlers & Spiders][+ 6rep]

    Is there a way of defining real users to crawlers? I dont want to change what it sees just log that stat down differently in mysql.

    Could you tell me any functions to define this or a way to do it

    Thanks guys

  2. #2
    Join Date
    Dec 2004
    Location
    Essex, UK
    Posts
    3,285
    Tokens
    0

    Latest Awards:

    Default

    Find out the user agent names of the bots, and then just log them differently if it finds that the user agent of the visitor equals one of the user agents in an array of crawler ones.

    Alternatively, I'd suggest you get some bot IPs and do the same with them.



    i used to be NintendoNews. visit my blog or add me on twitter.
    need help with vista? i am a microsoft certified technology specialist in configuring windows vista and connected home integrator.. pm me for help!


    "I am the way, the truth, and the life. No one comes to the Father except through me"
    John 14:6 (NIV)


  3. #3
    Join Date
    Mar 2005
    Location
    Leeds
    Posts
    3,423
    Tokens
    0

    Latest Awards:

    Default

    Thats what I was thinking of doing, as a last resort as it wont be pick less know crawlers. Anyway if thats the only way to do it Ill work on it ;')

  4. #4
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    693
    Tokens
    0

    Default

    You could just (if I'm on the right track) use javascript, and use document.write to place a dynamic image (ie: image.php) which just places a 1x1 transparent image wherever, but the script itself does whatever you wanted it to.

    From my knowledge webcrawlers don't have javascript enabled.
    XHTML, CSS, AJAX, JS, php, MySQL.

    --

    HxF moderators can't read timestamps.

  5. #5
    Join Date
    Mar 2005
    Location
    Leeds
    Posts
    3,423
    Tokens
    0

    Latest Awards:

    Default

    How do I get the users or servers dns/host information? IP seems to be a task due to the number of IP they use

    google uses loads

    209.185.108, 209.185.253, 216.239.33.96, 216.239.33.97, 216.239.33.98, 216.239.33.99, 216.239.37.98, 216.239.37.99, 216.239.39.98, 216.239.39.99, 216.239.41.96, 216.239.41.97, 216.239.41.98, 216.239.41.99, 216.239.45.4, 216.239.46, 216.239.51.96, 216.239.51.97, 216.239.51.98, 216.239.51.99, 216.239.53.98, 216.239.53.99, 216.239.57.96, 216.239.57.97, 216.239.57.98, 216.239.57.99, 216.239.59.98, 216.239.59.99, 216.33.229.163, 64.233.173.193, 64.233.173.194, 64.233.173.195, 64.233.173.196, 64.233.173.197, 64.233.173.198, 64.233.173.199, 64.233.173.200, 64.233.173.201, 64.233.173.202, 64.233.173.203, 64.233.173.204, 64.233.173.205, 64.233.173.206, 64.233.173.207, 64.233.173.208, 64.233.173.209, 64.233.173.210, 64.233.173.211, 64.233.173.212, 64.233.173.213, 64.233.173.214, 64.233.173.215, 64.233.173.216, 64.233.173.217, 64.233.173.218, 64.233.173.219, 64.233.173.220, 64.233.173.221, 64.233.173.222, 64.233.173.223, 64.233.173.224, 64.233.173.225, 64.233.173.226, 64.233.173.227, 64.233.173.228, 64.233.173.229, 64.233.173.230, 64.233.173.231, 64.233.173.232, 64.233.173.233, 64.233.173.234, 64.233.173.235, 64.233.173.236, 64.233.173.237, 64.233.173.238, 64.233.173.239, 64.233.173.240, 64.233.173.241, 64.233.173.242, 64.233.173.243, 64.233.173.244, 64.233.173.245, 64.233.173.246, 64.233.173.247, 64.233.173.248, 64.233.173.249, 64.233.173.250, 64.233.173.251, 64.233.173.252, 64.233.173.253, 64.233.173.254, 64.233.173.255, 64.68.80, 64.68.81, 64.68.82, 64.68.83, 64.68.84, 64.68.85, 64.68.86, 64.68.87, 64.68.88, 64.68.89, 64.68.90.1, 64.68.90.10, 64.68.90.11, 64.68.90.12, 64.68.90.129, 64.68.90.13, 64.68.90.130, 64.68.90.131, 64.68.90.132, 64.68.90.133, 64.68.90.134, 64.68.90.135, 64.68.90.136, 64.68.90.137, 64.68.90.138, 64.68.90.139, 64.68.90.14, 64.68.90.140, 64.68.90.141, 64.68.90.142, 64.68.90.143, 64.68.90.144, 64.68.90.145, 64.68.90.146, 64.68.90.147, 64.68.90.148, 64.68.90.149, 64.68.90.15, 64.68.90.150, 64.68.90.151, 64.68.90.152, 64.68.90.153, 64.68.90.154, 64.68.90.155, 64.68.90.156, 64.68.90.157, 64.68.90.158, 64.68.90.159, 64.68.90.16, 64.68.90.160, 64.68.90.161, 64.68.90.162, 64.68.90.163, 64.68.90.164, 64.68.90.165, 64.68.90.166, 64.68.90.167, 64.68.90.168, 64.68.90.169, 64.68.90.17, 64.68.90.170, 64.68.90.171, 64.68.90.172, 64.68.90.173, 64.68.90.174, 64.68.90.175, 64.68.90.176, 64.68.90.177, 64.68.90.178, 64.68.90.179, 64.68.90.18, 64.68.90.180, 64.68.90.181, 64.68.90.182, 64.68.90.183, 64.68.90.184, 64.68.90.185, 64.68.90.186, 64.68.90.187, 64.68.90.188, 64.68.90.189, 64.68.90.19, 64.68.90.190, 64.68.90.191, 64.68.90.192, 64.68.90.193, 64.68.90.194, 64.68.90.195, 64.68.90.196, 64.68.90.197, 64.68.90.198, 64.68.90.199, 64.68.90.2, 64.68.90.20, 64.68.90.200, 64.68.90.201, 64.68.90.202, 64.68.90.203, 64.68.90.204, 64.68.90.205, 64.68.90.206, 64.68.90.207, 64.68.90.208, 64.68.90.21, 64.68.90.22, 64.68.90.23, 64.68.90.24, 64.68.90.25, 64.68.90.26, 64.68.90.27, 64.68.90.28, 64.68.90.29, 64.68.90.3, 64.68.90.30, 64.68.90.31, 64.68.90.32, 64.68.90.33, 64.68.90.34, 64.68.90.35, 64.68.90.36, 64.68.90.37, 64.68.90.38, 64.68.90.39, 64.68.90.4, 64.68.90.40, 64.68.90.41, 64.68.90.42, 64.68.90.43, 64.68.90.44, 64.68.90.45, 64.68.90.46, 64.68.90.47, 64.68.90.48, 64.68.90.49, 64.68.90.5, 64.68.90.50, 64.68.90.51, 64.68.90.52, 64.68.90.53, 64.68.90.54, 64.68.90.55, 64.68.90.56, 64.68.90.57, 64.68.90.58, 64.68.90.59, 64.68.90.6, 64.68.90.60, 64.68.90.61, 64.68.90.62, 64.68.90.63, 64.68.90.64, 64.68.90.65, 64.68.90.66, 64.68.90.67, 64.68.90.68, 64.68.90.69, 64.68.90.7, 64.68.90.70, 64.68.90.71, 64.68.90.72, 64.68.90.73, 64.68.90.74, 64.68.90.75, 64.68.90.76, 64.68.90.77, 64.68.90.78, 64.68.90.79, 64.68.90.8, 64.68.90.80, 64.68.90.9, 64.68.91, 64.68.92, 66.249.64, 66.249.65, 66.249.66, 66.249.67, 66.249.68, 66.249.69, 66.249.70, 66.249.71, 66.249.72, 66.249.78, 66.249.79, 72.14.199, 8.6.48


    edit:

    Ill do the agent
    Last edited by Luckyrare; 01-03-2007 at 08:44 PM.

  6. #6
    Join Date
    Aug 2004
    Location
    UK
    Posts
    11,283
    Tokens
    2,031

    Latest Awards:

    Default

    The useragents the trick, basicaly the useragent of a bot will be substantaly differnt from that of a browser, you dont nessary need to have a list of bots out there ether, simply checking from an ocurance of the word bot in the browser string will usealy turn em up, id guess theres probably a number of other differnces in it, if you compaired a few with that of normal browser, browser strings "/

  7. #7
    Join Date
    Mar 2005
    Location
    Leeds
    Posts
    3,423
    Tokens
    0

    Latest Awards:

    Default

    Quote Originally Posted by 01101101entor View Post
    The useragents the trick, basicaly the useragent of a bot will be substantaly differnt from that of a browser, you dont nessary need to have a list of bots out there ether, simply checking from an ocurance of the word bot in the browser string will usealy turn em up, id guess theres probably a number of other differnces in it, if you compaired a few with that of normal browser, browser strings "/
    Ok thanks so something like this would do the trick, havent finished it yet. I am pretty sure I will have to list all of them as they all are in different formats

    http://willmacc.wordpress.com/bot-ips/

    PHP Code:
    $agent $_SERVER['HTTP_USER_AGENT'];

    if (
    eregi('google'$agent) | eregi('gsa-crawler'$agent)) {

    // Googlebot

    }

    elseif (
    eregi('search.msn.com'$agent) | eregi('MS Search 4.0 Robot'$agent)) {

    // MSNBot

    }


    elseif (
    eregi('ZyBorg/1.0'$agent)) {

    // WISEnut

    }


    elseif (
    eregi('Scooter/3.3Y!CrawlX'$agent)) {

    // Alta Vista

    }

    elseif (
    eregi('Ask Jeeves/Teoma'$agent) | eregi('AskJeeves/Teoma'$agent) | eregi('teoma_agent1'$agent) | eregi('ask.com'$agent) ) {

    // Ask Jeeves/Teoma 

    }

    elseif (
    eregi('Lycos_Spider_(modspider)'$agent)) {

    // Ask Jeeves/Teoma 

    }

    elseif (
    eregi('ia_archiver'$agent)) {

    // Alexa



  8. #8
    Join Date
    Aug 2004
    Location
    UK
    Posts
    11,283
    Tokens
    2,031

    Latest Awards:


  9. #9
    Join Date
    Mar 2005
    Location
    Leeds
    Posts
    3,423
    Tokens
    0

    Latest Awards:

    Default

    Nice, Ill build up a list tomorrow. Thanks for your help

  10. #10
    Join Date
    Oct 2005
    Location
    Melbourne, Australia
    Posts
    7,554
    Tokens
    0

    Latest Awards:

    Default

    why not just is something like
    PHP Code:
    if (eregi('Mozilla/5.0'$agent)) {

    // WISEnut


    or something?

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •