Bias Variance Tradeoff

This is my first Knitr document, which lets the user combine R code and text in a single formatted document.

I wanted to have an accessible example that illustrates the bias variance tradeoff.

An illustration of the Bias Variance Tradeoff

An illustration of the Bias Variance Tradeoff

by Gene Leynes


The Bias Variance Tradeoff is an important concept in machine learning. This concept helps you evaluate which model will work the best.

When most people think of fitting a model, something like this comes to mind:
plot of chunk unnamed-chunk-1

Where you basically just draw the best straight line though some points. This paradigm makes it hard to imagine what some one would mean by “model selection”.

The bais varance problem arises when you start to use non linear models that don't have to follow straight lines.

If you consider this data fit with two different smoothing parameters:
plot of chunk unnamed-chunk-2

you can get a sense of the problem.

Intuitively the plot on the left seems to do a better job at representing the information contained in the data… However the model on the right has absolutely no error.

This is the bias variance tradeoff.

Continue reading

Installing StatET

EDIT: Completely ignore the advice below. R Studio is now the way to go for an R development environment. It was a viable alternative about a year after I wrote this post, and now it’s hands down the only way to go.

About StatET and Eclipse

StatET is a powerful plug-in that allows you to use R inside the Integrated Development Environment (IDE) known as Eclipse. The features in Eclipse make it easier to write code in R, unless perhaps you’re already using something more sophisticated.

Eclipse has a reputation for having a “steep learning curve”. However, I have found it to be useful even if you barely know what you’re doing. The more you learn, the more useful it becomes.

StatET has a reputation for being difficult to install. There are a few things that tricky for non-programmers. Hopefully this post will make those things more obvious.

StatET is written by Stephan Wahlbrink. The official website and more detailed instructions can be found here:

System Requirements

I will be showing you how I installed the plug-in for Eclipse Indigo, using R 2.14.1. I’m using a Windows XP machine. The process is similar for Windows 7.

My Steps
Continue reading

How to upgrade to a new version of R

I updated to R 2.14.1 for the StatET instructions post (forthcoming).  While doing that, I noticed some upgrading instructions in R’s Frequent Asked Questions.

upgrade txt from FAQ 2.8

I gave it a try, but the results were a little annoying.  First of all, I had to be careful to copy over only my custom libraries, and not the core libraries (like “base” and “stats”).

Then, when I issued the update commands:
## The FAQ had ask=FALSE, but I wanted to see what was going on,
## so I set ask=TRUE
update.packages(checkBuilt=TRUE, ask=TRUE)

Unfortunately, the update.packages command updated nearly every custom package, and (oddly) a few core packages as well.  Also, I was expecting “update” to mean “just update missing files”. However, “update” meant “download the whole package and install from scratch”. So it didn’t save time or bandwidth.

I found it easier to run these commands to list the folders that are in the old library, but not in the new one:
OldFolders = list.files('C:/Documents and Settings/Gene/My Documents/R/win-library/2.13')
NewFolders = list.files('C:/Program Files/R/R-2.14.1/library')
OldFolders[!OldFolders %in% NewFolders]

Note that in 2.14 they seem to have gone back to storing the libraries in the “Program Folder” rather than in “My Documents”.  I think the original switch to “My Documents” was a work around to avoid needing admin privileges every time you install a new package / library.

Then I manually installed the libraries one by one using “install.packages”, e.g.:

The manual installation is useful because
•    Some of libraries might not be available on CRAN
•    You might not need all your old libraries
•    Some libraries install dependencies, so you can skip the dependences

Every so often I would rerun the oldfolders / newfolders code to check what was still needed.

Use R to choose your secret santa partner

Ok, so you want to choose your secret santa partners, but you can’t find a hat? Well, here is an R Script that can swoop in to your rescue.

This isn’t the most elegant or efficient code, but unless you have a really huge family it won’t take long to run.

ChooseSS = function(people, avoidmatch){
	permuteMyPeople = function(peeps){
		PeepsPermuted = sample(peeps)
			PeepsPermuted = permuteMyPeople(peeps)
	cbindMyPermutedPeople = function(peeps){
		cbind(p1=people, p2=permuteMyPeople(people))
	ret = cbindMyPermutedPeople(people)
	m1 = sapply(avoidmatch, match, ret[,1])
	m2 = sapply(avoidmatch, match, ret[,2])
		ret = cbindMyPermutedPeople(people)
		m1 = sapply(avoidmatch, match, ret[,1])
		m2 = sapply(avoidmatch, match, ret[,2])

And, you can run it with this “example” family:

family = c('Dick', 'Bonnie', 'Suzy', 'Jeff', 'Amy', 'Mike',
	'Kindy','Gene','Emily','Joe', 'Courtney', 'Meghann')
avoidmatch = list(c('Mike', 'Amy'), c('Suzy', 'Jeff'), c('Courtney', 'Meghann'),
	c('Dick', 'Bonnie'))
ChooseSS(family, avoidmatch)