Apache access_log to Google Earth KML
Where are the visitors to kahunaburger.com coming from? I admitted in the past that I’m log-file-junkie. There’s usually a terminal window on desktop that runs multitail (or my own airlog) against a number of log-files on various servers. Line after line scrolls by as people hit those servers. The ip-address does not tell you too much about the visitor and I always wanted to see where those ip-addresses are located.
Just the other day I saw link to a free IP address geolocation SQL database (thanks Marc for making that one available). I downloaded the 11MB file and added the database to my mysql server.
Next, I created a simple perl script, that walks over my web servers (apache) access_log, extracts ip-addresses, access-date/time and url and finally converts all those items (using above mentioned database) into a KML file that can be fed to Google Earth.
The result looks like this in Google Earth:
And here’s the script that does the magic (it assumes that you have stored the database in “ipinfo”):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | use strict; use DBI; my %seen; my $dbh = DBI->connect("dbi:mysql:ipinfo","username","password"); die "unable to connect to database" unless ($dbh); # kml header print qq{< ?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2"> \t<document> }; # loop over access_log lines while(<>) { # does it look like an access log entry? next unless (/^(\d+)\.(\d+)\.(\d+)\.(\d+).*\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+HTTP\/\d.\d/); my($a,$b,$c,$d,$date,$method,$uri)=($1,$2,$3,$4,$5,$6,$7); # make sure we have a good ip address next if ($a < 0 || $a > 255 || $b < 0 || $b > 255 || $c < 0 || $c > 255 || $d < 0 || $d > 255); # did we see this IP already? next if $seen{"$a.$b.$c.$d"}++; # compute value for ipinfo lookup my($val)=($a*256+$b)*256+$c; # fetch ipinfo data # WARNING! for whatever reason the code-beautifier adds an extra space between the < and = below # WARNING! that space has to be removed in your code! my($r)=$dbh->selectall_arrayref(qq{select country_code,region_code,city,latitude,longitude from ip_group_city where ip_start< =$val order by ip_start desc limit 1}); # no information? no placemark! next if !defined($r); print qq{\t\t<Placemark>\n}; print qq{\t\t\t<name>$a.$b.$c.$d</name>\n}; print qq{\t\t\t<description>\n< ![CDATA[\n}; print qq{<b>$method $uri from $r->[0]->[0]/$r->[0]->[1]/$r->[0]->[2] at <i>$date</i><br />\n}; print qq{]]>\n\t\t\t</description>\n}; print qq{\t\t\t<point>\n}; print qq{\t\t\t\t<coordinates>$r->[0]->[4],$r->[0]->[3]</coordinates>\n}; print qq{\t\t\t</point>\n}; print qq{\t\t\n}; } # kml trailer print qq{\t</document> </kml> }; $dbh->disconnect(); |
And you run the above script via:
perl log2kml.pl < access_log > output.kml
Here’s a little sample file from my web-server. Each ip-address is only recorded once, so if you have the same person visit your site several times, only the first hit will be shown and subsequent ones are ignored: log2kml.kmz (51K – click to open it in Google Earth)
Next up is a version that does live-tracking: as the web server is hit, Google Earth will automatically rotate to the location associated with the ip-address ๐
March 28th, 2009 at 10:55 am
[…] played a bit more with the idea that I presented in “Apache access_log to Google Earth KML” and, I think, I came up with something extremely […]
April 7th, 2009 at 5:26 pm
Great Stuff, thanks.
There is small typo in the code though. not really a typo more of a space.
here ip_start < = $val
should be
here ip_start <= $val
otherwise we get
here ip_start select country_code from ip_group_city where ip_start < = 14203352 order by ip_start desc limit 1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near โ= 14203352 order by ip_start desc limit 1โฒ at line 1
April 7th, 2009 at 5:30 pm
Thanks for catching this one – no idea why the copy & paste added that extra space.
Cheers – Tobias
PS: something (or somebody) does not want to keep the <= characters together - I added a warning in the code and need to figure out what keeps on inserting that extra space ...
April 7th, 2009 at 6:06 pm
Just Brilliant Tobi. Great work.
I am still playing with your code. saw the co-ordinates move to a 0,0. so trying to figure that part out.
Oh and you could also create style icons instead of placemarks if you wanted to ( that way you can separate/represent different request types with different icons )
http://maps.google.com/mapfiles/ms/icons/red-dot.png
April 7th, 2009 at 6:13 pm
Thanks Haarith – as I said, if you like this one, wait til you take a look at the one I posted this morning: http://www.kahunaburger.com/2009/04/07/google-earth-forensics/ ๐
April 8th, 2009 at 5:28 pm
Final version. Thanks again for putting this up.
April 8th, 2009 at 5:42 pm
Ugh – those comments did not turn out too well. If you want to send the code to me, I’ll make sure it looks ok. Tks – T
April 8th, 2009 at 6:22 pm
Sent the code..offcourse edited the username and password to be hd .. should not have been lazy and put username and password in its place .. ๐
April 9th, 2009 at 9:43 am
Ok – I escaped it properly and also deleted the older trashed comment – thanks again ๐
August 26th, 2010 at 9:37 am
Just thought i should let you know…This STILL rocks :)..i havent really found anything that matches the clean execution..Brilliant stuff Tobi ๐