LearningLiveCode: Using LiveCode to Create a Word Cloud (Almost) with a Nod Toward Data Mining

OK, it was 5:30 p.m. on Friday afternoon and I was completely spent mentally from a three-hour faculty meeting. I wanted to end the work week on an upbeat note, so I decided to take a few minutes to work with LiveCode. I've spent all of my free time this week working on my Q sort tool, so I wanted to do something brand-new. So, I started a new LiveCode project that had recently been on my mind. On Thursday I had attended an excellent presentation about data mining by one of our very talented doctoral students - Neo Hao - at the Design, Development and Research Conference hosted at the University of Georgia (and chaired by Dr. Rob Branch). Neo's presentation made me think about how easy it would be to build a simple example of a data mining program: a word cloud using the frequency of words in a given passage of text. This post is only the result of about 30 minutes of work. I didn't finish the project, but I was able to build the basics and I think it is an interesting example of using the excellent list processing capabilities of LiveCode. (Interestingly, I worked at least another hour on the program after I started writing this blog on aesthetic stuff, like fonts, labels, and some user feedback in order to make the project "presentable" in a blog posting. That is the way it is always is with software design - the time needed to make a program work is always much less than the time needed to make it usable.)

I'm sure you know what a word cloud is, but if not, here's an example built with wordle.net and based on titles of some of the things I've published over the past few years:

As you can see, a word cloud is simply a listing of all the unique words in the passage with a visual representation of the frequency of the words. The more times the word is used, the bigger its font size. A word cloud is an excellent visual representation of the importance of certain words in a given passage of text, and I think you can get a good, quick, snapshot of what I've been writing about in my published work just by scanning this image. Notice, by the way, the fact that inconsequential words, such as "a" and "an," "the," "and," and the like are not represented. Also notice that punctuation has been stripped out. These become important points for us to consider later on.

Now, don't get your hopes up that I'm going to show how to build a word cloud as elegant as this. In fact, all I've built so far is a little program that takes a passage of text and figures out all of the unique words and computes the number of times they are used. It also strips out all of the words you decide are inconsequential. It also provides some quick summary data, such as the total number of words in the passage of text and the total number of unique words.

Here's a screenshot of the program using Abraham Lincoln's second inaugural address:

Get My LiveCode file

[ Get the free LiveCode Community version. ]

How It Works

The card consists of the following main fields displayed left-to-right on the screen:

original - this field contains the original passage of text you want to analyze;
unique words - this field contains all unique words found in the original passage;
word frequencies - this field contains the unique words plus their frequencies
ignore - this field (with a reddish background color) lists all words you want to be ignored when identifying unique words.

All of the code is contained within the button "Analyze." The program works with several repeating loops, which I've color coded blue and green:

on mouseUp

put empty into field "unique words"

put empty into field "target word"

put empty into field "word frequencies"
put empty into field "unique count"

put empty into field "frequency count"

put the number of words in field "original" into L

//Search for Unique Words; Remove Punctuation

put "Finding unique words..." into field "working" //user feedback

show field "working"

repeat with i=1 to L

put empty into field "target word"

put word i of field "original" into varTargetWord

//Strip out any punctuation found in the word

if the last character of varTargetWord = comma then delete the last character of varTargetWord

if the last character of varTargetWord = "." then delete the last character of varTargetWord

if the last character of varTargetWord = ";" then delete the last character of varTargetWord

if the last character of varTargetWord = "?" then delete the last character of varTargetWord

if the last character of varTargetWord = "!" then delete the last character of varTargetWord

if the last character of varTargetWord = ":" then delete the last character of varTargetWord

if the last character of varTargetWord = quote then delete the last character of varTargetWord

if the first character of varTargetWord = quote then delete the first character of varTargetWord

if the first character of varTargetWord = "(" then delete the first character of varTargetWord

if the last character of varTargetWord = ")" then delete the last character of varTargetWord

put the number of lines in field "unique words" into LL

put true into varUniqueWordFound

//Check to see if the word has already been found

put the number of lines in field "unique words" into LL

put true into varUniqueWordFound

repeat with j=1 to LL

if varTargetWord = word j of field "unique words" then

put false into varUniqueWordFound

exit repeat

end if

end repeat

if varUniqueWordFound is true then put varTargetWord&return after field "unique words"

end repeat

//Compute the frequencies of each unique word found

put "Computing frequencies..." into field "working" //user feedback

put the number of lines in field "unique words" into L

repeat with i = 1 to L

put line i of field "unique words" into varUniqueWord

//First, look for inconsequential words and strip out

put the number of lines in field "ignore" into LLL

put false into varWordToIgnoreFound

repeat with k= 1 to LLL

put line k of field "ignore" into varWordToIgnore

if varUniqueWord=varWordToIgnore then

put true into varWordToIgnoreFound

exit repeat

end if

end repeat

if varWordToIgnoreFound is true then next repeat

//Ok, begin counting the unique words

put 0 into varCount

put the number of words in field "original" into LL

repeat with j=1 to LL

put word j of field "original" into varTargetWord