Submitted by dave on
I often find that I want to do more advanced queries of airports/cities than search engines allow, e.g. find me large cities which I can fly to from both these places and not have to travel more than 50km at the end. I'm using this as an excuse to learn:
- Scala - it sounds cool, the concurrency looks really nice, could work nicely for agent interaction and it's JVM based so I feel at home
- MongoDB - seems interesting, has geospatial support
- Lift - it's about time I got into a web framework again (since handrolling stuff in perl years ago...)
I found the very exciting OpenFlight data, which should form the main body of data to query.
Importing data into MongoDB [edit]
OpenFlights provides .dat files, which are CSV files with all the info in. I've added headers to these in line with their description, e.g.
ID, Name, City, Country, IATA-FAA, ICAO, Latitude, Longitude, Altitude, Timezone, DST 1,"Goroka","Goroka","Papua New Guinea","GKA","AYGA",-6.081689,145.391881,5282,10,"U" 2,"Madang","Madang","Papua New Guinea","MAG","AYMD",-5.207083,145.7887,20,10,"U"
Assuming that you have mongodb running, these can be loaded into collections as follows:
>mongoimport -d airportdb -c airports --type csv --headerline airports.dat then
> db.airports.find().limit(2).forEach( printjson )
{
"_id" : ObjectId("4d933b7f732fd2527b726f6c"),
"ID" : 1,
"Name" : "Goroka",
"City" : "Goroka",
"Country" : "Papua New Guinea",
"IATA-FAA" : "GKA",
"ICAO" : "AYGA",
"Latitude" : -6.081689,
"Longitude" : 145.391881,
"Altitude" : 5282,
"Timezone" : 10,
"DST" : "U"
}
...I've now got 3 databases: airports, airlines and routes.
In order to use the geospatial side of MongoDB, we need each object to have a subobject with lat/long in. This turns out to be a little more involved than I was expecting, but a good way to get the hang of map/reduce:
m = function() { emit( this._id, this ) }
r = function(key, values){ values[0].loc = { lat: values[0].Latitude, lon: values[0].Longitude }; return values[0] }
db.airports.mapReduce( m, r, {out: "airports"});This is a very simple version of map reduce, where:
- the map function just returns the whole object
- the reduce function adds the subfield, and then returns the whole object
- mapReduce uses both of these, and stores the result in the collection "airports", which is where it came from. This overwrites the original data with the original data + additions.
Slightly convoluted, but it does the job. Also, since this is only really happening on import, speed isn't too much of an issue.
- UPDATE* This returns objects as { key: ObjectID, value: (actual object) }, so it's not a direct replacement for the original collection. I could put it into a temporary collection and iterate (we're not supposed to do db access in map/reduce any more). But, in the spirit of learning, here's an alternative approach:
makeCoord = function(e) { e.loc={ lat: e.Latitude, lon: e.Longitude }; db.airports.save( e ); }
db.airports.find().forEach( makeCoord )This feels a bit closer to what I originally had in mind, but it seems to be slower; I think it has to pull every object in, modify it and send it back. Now I can find the airports close to Edinburgh:
edin = { lat: 55.57, lon:-3.13 }
db.airports.find( { loc: { $near : edin, $maxDistance:1 } } )
{ "_id" : ObjectId("4d9359ed732fd2527b74a278"), "ID" : 7388, "Name" : "Edinburgh Waverly Station", "City" : "Edinburgh", "Country" : "United Kingdom", "IATA-FAA" : "", "ICAO" : "\\N", "Latitude" : 55.952, "Longitude" : -3.189, "Altitude" : 0, "Timezone" : 0, "DST" : "U", "field11" : 75.1487722, "field12" : 1, "field13" : -5, "field14" : "A", "field15" : 389, "field16" : -55.829722, "field17" : 1873, "field18" : -3, "field19" : "U", "loc" : { "lat" : 55.952, "lon" : -3.189 } }
{ "_id" : ObjectId("4d9359ed732fd2527b748d3f"), "ID" : 535, "Name" : "Edinburgh", "City" : "Edinburgh", "Country" : "United Kingdom", "IATA-FAA" : "EDI", "ICAO" : "EGPH", "Latitude" : 55.95, "Longitude" : -3.3725, "Altitude" : 135, "Timezone" : 0, "DST" : "E", "loc" : { "lat" : 55.95, "lon" : -3.3725 } }OK, so the database seems to contain Waverley station, but it's generally working.
Now we can start getting into actual data manipulation. First, I can make a collection of destination arrays - the airports you can reach from each airport listed:
makeDest = function (source) {
var dst = db.routes.distinct("Destination", {Source:source});
var o = {Source:source, Dests:dst};
db.destinations.save(o);
}
db.routes.distinct( "Source" ).forEach( makeDest )This creates a collection of objects with an airport name and an array of destinations reachable. Given this, I can compute the airports which you can reach from both startpoints:
fromBoth = function (a, b) {
ad = db.destinations.findOne({Source:a}).Dests;
bd = db.destinations.findOne({Source:b}).Dests;
return ad.filter(function (n) {return bd.indexOf(n) != -1;});
}
fromBoth( "EDI", "CPH" );
[
"AGP",
"ALC",
"AMS",
...This is quite basic, but it's a good start. Next, I'll look at putting this into a Lift webapp...
