Monday, June 10, 2013

Statistics Teacher Helper Tool: Generating Random Numbers and Normal Distributions

If all goes well, I'll be teaching my first Massive Open Online Course (MOOC) at the end of the summer. My course title is "Statistics in Education for Mere Mortals." It will be managed and distributed by Instructure using Canvas, their learning management system (LMS).

I chose this topic based on the advice to choose a topic that I love to teach, but rarely have the opportunity to teach it as fully as I wish. I teach a little statistics in my research methods course for masters students. I've used an instructional strategy of having students learn the fundamentals of statistics by creating Excel spreadsheets from scratch. In my online course, I created some short video tutorials that the students follow. Then, I send them each a new data set and they have to email me the resulting statistic. I'm going to follow the same approach in my MOOC, but with a few twists to allow this to all be graded by the LMS.

Anyhow, the purpose of this blog posting is not to talk about my MOOC, but rather to share a little helper tool I created for it using LiveCode. I wanted to be able to create unique data sets. I started by creating a simple tool that generated random numbers. That was good, but what I really needed was a tool that would create a normally distributed data set. Now, that's a little trickier. In other words, when the frequencies of the numbers in the data set are graphed, it will take on the shape of the classic bell curve. How does program a set of normally distributed numbers?

Well, Sir Francis Galton invented a device called the Galton Machine (or the Galton Box, or the Bean Machine, or a list of many other names) that I'll bet many of you either know about or have actually seen. A common name for a game based on the Galton Machine is Pachinko. The idea is that you drop marbles, one by one, through a sieve of pins so that as the ball drops, it strikes a pin and either goes left or right with a probability of 50%. It then drops down into a column at the bottom. The more marbles you drop, the closer the resulting collection of marbles resembles a normal curve.

Click here to see a computer simulation (programmed with Java) of this device. Here's a screen snapshot of the simulation as it chugs along:


As the columns continue to grow, they will nicely match the shape of the bell curve.

So, I used this very concrete idea to program my own Galton Machine. Now, I didn't have time (or interest) in building an animated interface to show everyone what is happening. Instead, I just have the program build a column of numbers (left column) that result from having a number begin at a certain starting position, then increase or decrease by a certain amount. The chance of increasing or decreasing is 50/50. I have 25 such random events, corresponding to the number of rows of pins that the marbles fall through. But I easily could have had as many as I wanted. Here is the code:

      repeat 25 times
         //A Pachinko machine
         put random(2) into varNail
         if varNail=1 then put varNumber-random(varBounce) into varNumber
         if varNail=2 then put varNumber+random(varBounce) into varNumber
      end repeat
      put varNumber into line (the number of lines in field "fldRandomList" +1) of field
        "fldRandomList"

varNumber is the starting point on the x-axis where the marbled is dropped.
varBounce is a variable I created to determine the "bounciness" of the marble. A value of 1 is like a regular marble, and 5 would be like hard rubber.

The variable varNail is just a random number from 1 to 2. When 1 comes up, the marble bounces left (i.e. the bounce number is subtracted from varNumber). Conversely, when 2 comes up, the marble bounces right (i.e. the bounce number is added to varNumber).

The result is put into the next open line (i.e. the number of lines in field "fldRandomList" +1) in the field "fldRandomList."

(The code for all of this is in the button "Generate Distribution.")

I give the user control over how many marbles are dropped, the starting point of the drop, and the bounciness of the marbles.

Now, this would have been good enough, but I wanted to have a little more functionality. I partner with Excel again here to take advantage of Excel's graphing capabilities. I found a cool technique to graph a set of numbers in Excel, created by folks at Santa Clara University. However, it's a little clunky to have to first have Excel create the frequencies of the numbers. So, I added another column (right column) to do just that. (I had a devil of a time figuring this out -- it seemed so simple when I first conceived of doing it, but it took me over an hour to get it right.)

OK, here is a screen shot:

I also added some simply functionality, like automatically copying the frequencies in the right column to the clipboard for easy pasting into Excel. Here's the code for that:

   copy char 1 to -1 of field "fldFrequencies"

(I really don't understand the "1 to -1" part. But hey, it works!)

I also programmed it to compute the mean and the range. (I almost programmed it to compute the standard deviation, but my heart wasn't in it. Plus, it is so easy to compute this with Excel.)

I thought it might also be useful to also sort the list of raw numbers (left column). It turns out this is a snap to do in LiveCode -- just one line (in the button "Sort"):

      sort lines of field "fldRandomList" ascending numeric

I even added a little explanation -- just click on the question "What's the Galton Machine?" The program is not as fast as I would like, so be careful if you want very large data sets. The largest I tried were 500 numbers (i.e. marbles), which took well over a minute to crunch. But, I think most data set would be in the 30-50 range of total numbers.

If you teach statistics, you'll "probably" find this to be a useful tool.