Mar 25 2009

Apache access_log to Google Earth KML

Where are the visitors to kahunaburger.com coming from? I admitted in the past that I’m log-file-junkie. There’s usually a terminal window on desktop that runs multitail (or my own airlog) against a number of log-files on various servers. Line after line scrolls by as people hit those servers. The ip-address does not tell you too much about the visitor and I always wanted to see where those ip-addresses are located.

Just the other day I saw link to a free IP address geolocation SQL database (thanks Marc for making that one available). I downloaded the 11MB file and added the database to my mysql server.

Next, I created a simple perl script, that walks over my web servers (apache) access_log, extracts ip-addresses, access-date/time and url and finally converts all those items (using above mentioned database) into a KML file that can be fed to Google Earth.

The result looks like this in Google Earth:
2009-03-25-log2kml

And here’s the script that does the magic (it assumes that you have stored the database in “ipinfo”):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
use strict;
use DBI;
 
my %seen;
my $dbh = DBI->connect("dbi:mysql:ipinfo","username","password");
die "unable to connect to database" unless ($dbh);
# kml header
print qq{< ?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
\t<document>
};
# loop over access_log lines
while(<>) {
    # does it look like an access log entry?
    next unless (/^(\d+)\.(\d+)\.(\d+)\.(\d+).*\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+HTTP\/\d.\d/);
    my($a,$b,$c,$d,$date,$method,$uri)=($1,$2,$3,$4,$5,$6,$7);
    # make sure we have a good ip address
    next if ($a < 0 || $a > 255 || $b < 0 || $b > 255 ||
             $c < 0 || $c > 255 || $d < 0 || $d > 255);
    # did we see this IP already?
    next if $seen{"$a.$b.$c.$d"}++;
    # compute value for ipinfo lookup
    my($val)=($a*256+$b)*256+$c;
    # fetch ipinfo data
    # WARNING! for whatever reason the code-beautifier adds an extra space between the < and = below
    # WARNING! that space has to be removed in your code!
    my($r)=$dbh->selectall_arrayref(qq{select country_code,region_code,city,latitude,longitude
        from ip_group_city where ip_start< =$val order by ip_start desc limit 1});
    # no information? no placemark!
    next if !defined($r);
    print qq{\t\t<Placemark>\n};
    print qq{\t\t\t<name>$a.$b.$c.$d</name>\n};
    print qq{\t\t\t<description>\n< ![CDATA[\n};
    print qq{<b>$method $uri from $r->[0]->[0]/$r->[0]->[1]/$r->[0]->[2] at <i>$date</i><br />\n};
    print qq{]]>\n\t\t\t</description>\n};
    print qq{\t\t\t<point>\n};
    print qq{\t\t\t\t<coordinates>$r->[0]->[4],$r->[0]->[3]</coordinates>\n};
    print qq{\t\t\t</point>\n};
    print qq{\t\t\n};
}
# kml trailer
print qq{\t</document>
</kml>
};
$dbh->disconnect();

And you run the above script via:

perl log2kml.pl < access_log > output.kml

Here’s a little sample file from my web-server. Each ip-address is only recorded once, so if you have the same person visit your site several times, only the first hit will be shown and subsequent ones are ignored: log2kml.kmz (51K – click to open it in Google Earth)

Next up is a version that does live-tracking: as the web server is hit, Google Earth will automatically rotate to the location associated with the ip-address ๐Ÿ™‚

10 Responses to “Apache access_log to Google Earth KML”

  • Google Earth as a traceroute viewer | kahunaburger Says:

    […] played a bit more with the idea that I presented in “Apache access_log to Google Earth KML” and, I think, I came up with something extremely […]

  • HaarithDevarajan Says:

    Great Stuff, thanks.

    There is small typo in the code though. not really a typo more of a space.

    here ip_start < = $val
    should be
    here ip_start <= $val

    otherwise we get
    here ip_start select country_code from ip_group_city where ip_start < = 14203352 order by ip_start desc limit 1;
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near โ€˜= 14203352 order by ip_start desc limit 1โ€ฒ at line 1

  • Tobias Says:

    Thanks for catching this one – no idea why the copy & paste added that extra space.

    Cheers – Tobias

    PS: something (or somebody) does not want to keep the <= characters together - I added a warning in the code and need to figure out what keeps on inserting that extra space ...

  • HaarithDevarajan Says:

    Just Brilliant Tobi. Great work.

    I am still playing with your code. saw the co-ordinates move to a 0,0. so trying to figure that part out.

    Oh and you could also create style icons instead of placemarks if you wanted to ( that way you can separate/represent different request types with different icons )

    http://maps.google.com/mapfiles/ms/icons/red-dot.png

  • Tobias Says:

    Thanks Haarith – as I said, if you like this one, wait til you take a look at the one I posted this morning: http://www.kahunaburger.com/2009/04/07/google-earth-forensics/ ๐Ÿ˜‰

  • Haarith Devarajan Says:

    Final version. Thanks again for putting this up.

    use strict;
    use DBI;
    my %seen;
    my $dbh = DBI->connect("dbi:mysql:ipinfo","hd","hd");
    die "unable to connect to database" unless ($dbh);
    # kml header
    print qq{<?xml version="1.0" encoding="UTF-8"?>
    <kml xmlns="http://earth.google.com/kml/2.2">
    \t<Document>
    };
    print qq{<Style id="a">};
    print qq{<IconStyle>};
    print
    qq{<Icon><href>http://maps.google.com/mapfiles/ms/icons/green-dot.png</h
    ref></Icon>};
    print qq{</IconStyle>};
    print qq{ </Style>};
    # loop over access_log lines
    while(<>) {
       # does it look like an access log entry?
       next unless
    (/^(\d+)\.(\d+)\.(\d+)\.(\d+).*\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+HTTP\/\d.
    \d/);
       my($a,$b,$c,$d,$date,$method,$uri)=($1,$2,$3,$4,$5,$6,$7);
       # make sure we have a good ip address
       next if ($a < 0 || $a > 255 || $b < 0 || $b > 255 ||
                $c < 0 || $c > 255 || $d < 0 || $d > 255);
       # did we see this IP already?
       next if $seen{"$a.$b.$c.$d"}++;
       # compute value for ipinfo lookup
       my($val)=(($a*256+$b)*256+$c)*256;
       # fetch ipinfo data
       my($r)=$dbh->selectall_arrayref(qq{select
    country_code,region_code,city,latitude,longitude
           from ip_group_city where ip_start <= $val order by ip_start desc
    limit 1});
       # no iearth.google.complacemark!
       next if !defined($r);
       print qq{\t\t<Placemark>\n};
       print qq{\t\t\t<name>$a.$b.$c.$d</name>\n};
       print qq{\t\t\t<styleUrl>#a</styleUrl>\n};
     #    print qq{\t\t\t<description>$a.$b.$c.$d</description>\n};
       print qq{\t\t\t<description><![CDATA[};
       print qq{<b>$method $uri from $r->[0]->[0]/$r->[0]->[1]/$r->[0]->[2]
    at <i>$date</i><br />};
       print qq{]]></description>\n};
       print qq{\t\t\t<Point>\n};
       print
    qq{\t\t\t\t<coordinates>$r->[0]->[4],$r->[0]->[3]</coordinates>\n};
       print qq{\t\t\t</Point>\n};
       print qq{\t\t</Placemark>\n};
       print qq{\t\t\n};
    }
    # kml trailer
    print qq{\t</Document>
    </kml>
    };
    $dbh->disconnect();
    
  • Tobias Says:

    Ugh – those comments did not turn out too well. If you want to send the code to me, I’ll make sure it looks ok. Tks – T

  • Haarith Devarajan Says:

    Sent the code..offcourse edited the username and password to be hd .. should not have been lazy and put username and password in its place .. ๐Ÿ˜€

  • Tobias Says:

    Ok – I escaped it properly and also deleted the older trashed comment – thanks again ๐Ÿ™‚

  • Haarith Devarajan Says:

    Just thought i should let you know…This STILL rocks :)..i havent really found anything that matches the clean execution..Brilliant stuff Tobi ๐Ÿ™‚

Leave a Reply