[http://en.wikipedia.org/wiki/Crowdsourcing Crowdsourcing] refers to using the general population to carry out tasks, often for free. Often, the idea is that by combining many inputs of unknown or variable quality, high quality outputs can be created - either through selection or some form of organisation.
One of the most famous pieces of crowdsourcing is [http://en.wikipedia.org Wikipedia], which leverages the distributed knowledge of the internet, along with a strict set of guidlines about how to integrate contributions, to create a high quality knowledge base. In the realm of geodata, possibly the most famous example is [http://wiki.openstreetmap.org OpenStreetMap], a completely open competitor to google maps. Here, road names and locations are entered by the general public, and annotated with other useful information, collaboratively creating a detailed and accurate map. Because the maps are community driven, there is a tendency for information to be added which is useful to the community. Two examples:
* [http://www.freethepostcode.org/ Free the Postcode] is an attempt to recreate the postcode database that Royal Mail want to charge large amounts of money for, and put in [http://http://random.dev.openstreetmap.org/postcodes/ on a map]:
As a modeller of human behaviour, we quite often want to understand what people's preferences an desires are, as this underpins many of their decisions. Typically, this is done by sending out questionnaires, which may be qualitative or quantitative, and may use techniques such as [wiki:Conjoint_analysis]. This is generally a high cost, low return activity, meaning that many models are built using data from less than 100 participants. When looking at [http://www.mysociety.org/ MySociety]'s excellent [http://mapumental.channel4.com Mapumental] application, I saw that they had combined travel times (on public transport), house prices and "senicness", to help people decide where to live. I could see how they could calculate travel times - get hold of public transport schedules, [http://www.cs.sunysb.edu/~skiena/combinatorica/animations/dijkstra.html search] for shortest times. I could understand getting hold of house price data and spatialising it. ([http://www.mysociety.org/2009/08/18/how-mapumental-works/ here's more detail on the infrastructure]). I didn't understand how they had put together "senicness" data across the UK - that's a highly subjective measurement, and to have the necessary resolution and depth of data seems daunting. So, that's where the crowdsourcing came in. Following on from the trend of ratings sites such as "Hot or not?" where the web at large is asked to rate people according to attractiveness, MySociety had set up "[http://scenic.mysociety.org/ SenicOrNot]". This uses the geotagged photographs from [http://www.geograph.org.uk/ Geograph] (which aims to have at least one *representative* photograph in each grid square of the UK), and presents visitors with an image to rate. These ratings are combined with the latitude and longitude of the photograph to give a map of senicness across the UK. It's not a perfect system - it's easy to confuse senicness with "quality of photo" or "weather in the photo" - but it provides a fun and interesting data set. They make the [http://scenic.mysociety.org/votes.tsv data avaiable] (although the header line needs converting to be a proper tab separated value file), and provide a [http://scenic.mysociety.org/leaderboard leaderboard] of the most and least senic places in the UK. They don't give a map of the data, though, which is a shame, so here's a quick version I made in [http://had.co.nz/ggplot2/ ggplot2], without doing any proper GIS work:
As you can see, it's pretty complete coverage of the UK. There's a fair correlation between urban areas an low senicness, and also elevation and senicness. And this is really the power of crowdsourcing - to get anywhere near this level of coverage (>170k points) would be time consuming and expensive to do as a standard data gathering exercise. By making a playful task which people will carry out for fun, suddenly vast amounts of data become available at very low cost. There is, of course, a definite issue around the quality of the data - there are no controls, no pre-questionnaire to assess each subject's background etc. So it's not a replacement for rigorous, controlled data collection. But as an additional tool in the armoury, it's hard not be excited about the possibilities.
=== Link Collection ===