Exploratory Data Visualisation with R, ggplot and ggobi Part 2

Error message

  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Use of undefined constant filter - assumed 'filter' in preg_replace() (line 1 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module(363) : regexp code).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _filter_drutex_process() (line 315 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_create_regex() (line 363 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Deprecated function: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in _drutex_unhide_all() (line 479 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).
  • Notice: Undefined index: en in drutex_node_view() (line 81 of /home/davemr/mo-seph.com/sites/all/modules/drutex/drutex.module).

In the [http://www.mo-seph.com/blog/datavis1 last post], talking about my work with Andreas and [http://www.ed.ac.uk/schools-departments/geosciences/people?indv=1578&cw_xml=person.html Marc], we saw how a bit of data visualisation helped to understand why some output was looking funny, and how the choice of classifier contributed to some strange behaviour. In that case, we were really lucky to spot it. What we'd like now is some better ways to spot and avoid these problems in the future, so this post is about another way to have a look at this data, and make sure it looks reasonable - some "defensive visualisation".

== Class Means ==

What was the real "problem" with the original classification? Really, it was that points were getting assigned to classes which were far away (in the space of input variables). Staying with visualisation (rather than statistics), it might be useful to look at the distance between points and the mean of the class they are assigned to. So, again with R, ggplot and ggobi, what we can do is:
* take the baseline data
* compute the means of each class
* display the distance between each point and the mean of its class

I'm going to use "error" here to talk about the difference between a point and the mean of the class it is assigned to.

This should give us some idea of the structure of the classes. It's quite hard stuff to visualise, because there are 125 classes and 4 variables, so there's a lot of data to see.

== Computing Means and Differences ==

First, we want to add differences from class means to a set of datapoints, so we can do the comparisons

[geshifilter-r]
#Using the data from last time, and the function below
means <- calculateClassMeans( origA ) 
#Read a file with 500 points from each class
data <- read.csv("data/Current500Strat.csv") 
# and reformat it to match output data
data <- alterBaseline( data ) 
#Create a data.frame of just the classes of all the points in the sample
classes <- data.frame( Class=dataTeX Embedding failed!Type <- "Baseline"
	classDisp <- getDiffData( classified )
	classDispTeX Embedding failed!Class <- factor(dataTeX Embedding failed!Class) )
		means <- rbind( means, colMeans( data[dataTeX Embedding failed!Class <- levels(dataTeX Embedding failed!Type <- "Baseline"
	classDisp <- getDiffData( classified )
	classDisp$Type <- "Classified"
	displayData <- rbind( baseDisp, classDisp )
	ggplot( melt( displayData,id.vars=c("newpointid","Class","Type" ) ), aes(x=Class,y=value,colour=Type) ) + 
			geom_jitter(position=position_jitter(height=0,width=0.3),alpha=0.1,size=1.3) + 
			opts(axis.text.x = theme_text(angle = 90, hjust = 0)) +
			facet_wrap( ~ variable, scales="free_y" )
}

getDiffData <- function(data)
{
	data[,c("Class","newpointid","DIFF_AI_AGG_5M", "DIFF_PET_AGG_5M", "DIFF_TAB0_AGG_5", "DIFF_TSD_AGG_5M")]
}
[/geshifilter-code]



Tags: