I spent about two more hours on my little word cloud project. I knew that the next step of building the actual word cloud from the list of unique words and frequencies was definitely within reach. Here's an example of one of the first word clouds I built using the text of Abraham Lincoln's second inaugural address:
I know my graphic design skills don't qualify me to give any "expert" opinion, but it definitely seems like an improvement to me, at least aesthetically. Again, though, the question of whether you get a more accurate representation of the data is an important question to ask.
Another improvement was the use of the "font step" slider at the top of the screen. When I ran the algorithm with different text passages, I found that 12 was not always the optimal number for the font step. I decided it was better to let the user experiment with this. A final minor improvement, I think, is making sure that the two words with the highest frequency are always shown in black for added emphasis. Here's a screen shot of the card that builds the word cloud.
One of the wonderful outcomes of building this project is discovering the "formattedWidth" property. This property does all of the hard work of figuring out exactly how wide or how tall a text field needs to be for its contents to fit perfectly within it. I didn't know about this property when I was first building my Q Sort project, so I came up with my own function- very imperfect - to try to do the same thing. I've since updated my Q Sort app to use this property. Here are the two key lines of code that accomplish this feat:
As I looked at Ali Lloyd's code, it reminded me of Richard Gaskin's advice to me almost a year ago to use the "repeat for each" form of going through a list of data rather than the "repeat with" approach that I have become so fond of. My code that computes the frequency of the words in a passage of text is painfully slow, so I'm now very motivated to get with it and try out the "repeat with" approach. So, look for at least one more update on this little project.
As always, the bottom-line for me is that I continue to learn new things every time I build a LiveCode project, however small. But, is there really any other way?
Ali Lloyd's post reminded me that the way I've been showing code in my blog posts has been really terrible, so I did the obvious thing and googled "showing code in a blog post using blogger" and quickly found a great tool:
There is a "mild" hack at play here. The words simply go to a random spot within this square area. Each separate field containing a word has the script "grab me" on mousedown so that I can easily move the words to a more aesthetically pleasing location. I decided it wasn't worth trying to figure out ways to keeps all of the words from overlapping, etc. However, for an excellent example of how to accomplish this, check out the blog posting by Ali Lloyd on his efforts of building a word cloud. Ali is one of the excellent professionals who work at RunRev (the parent company of LiveCode). Ali's solution is exactly what you think of when you think of a word cloud. It has words of different sizes and colors with different orientations filling every nook and cranny. It's really marvelous. So, many thanks to Ali for sharing this link with me in his comment to my previous blog posting on this topic. I'll be studying his code for some time to come. (And any script that uses sines and cosines makes me want to purr.)
OK, back to my humble attempt. One of the challenging parts to the project was figuring out the step-wise progression of font sizes. The above word cloud looks OK, but I was able to improve the word cloud algorithm in several fundamental ways. All of the code to build the word cloud is in the green button "Build Word Cloud" shown at the bottom of this post, but here are a few key highlights.
I perfected the step-wise progression of the font size so that the word with the highest frequency of use had a font size of 96 pixels. In the example above, the font size is directly proportional to the frequency. That was inadequate for many reasons, the most obvious is that the font sizes can be radically different for just for the first few most frequently used words. So, I revised the script so that the second next most frequently used word had a font size of 12 pixels smaller, no matter how fewer times it was used, and so on. Let me explain a little further. If one word was mentioned 100 times, but the next most frequently mentioned word was used only 20 times, then my revised script would give the second word a font size of only 12 pixels smaller. Think of 12 pixels as the height of the "step." From a data visualization standpoint, that skews the proportion in an inappropriate way, but it makes for a more aesthetically pleasing outcome. So, I think it all depends on what the purpose of the word cloud is. For scientific purposes, it is inadequate because it skews the output, but for a quick visual to get the gist of what's going on in a passage of text that is pleasing to the eye, it's fine. I also made 12 pixels the smallest font size that would be used. (In my original script, it was possible to have one word be 96 pixels and all remaining words 12 pixels if the most frequently used word was mentioned an inordinate amount times as compared to all other words.)
I also added the option to pick a color at random using the following code:
You'll need to scan the code below for these lines in the button script. The first three lines just pick three numbers at random from 0 to 255. These are used to produce a random RGB color if the "Color" option is checked.
These changes produced the following word cloud:
How I Did It
OK, back to my humble attempt. One of the challenging parts to the project was figuring out the step-wise progression of font sizes. The above word cloud looks OK, but I was able to improve the word cloud algorithm in several fundamental ways. All of the code to build the word cloud is in the green button "Build Word Cloud" shown at the bottom of this post, but here are a few key highlights.
Font Size Step-Wise Progression
Adding Color
I also added the option to pick a color at random using the following code:
put random (255) into rColor
put random (255) into gColor
put random (255) into bColor
if varColor is true then
set the foregroundcolor of it to rColor,gColor,bColor
else
set the foregroundcolor of it to black
end if
You'll need to scan the code below for these lines in the button script. The first three lines just pick three numbers at random from 0 to 255. These are used to produce a random RGB color if the "Color" option is checked.
These changes produced the following word cloud:
I know my graphic design skills don't qualify me to give any "expert" opinion, but it definitely seems like an improvement to me, at least aesthetically. Again, though, the question of whether you get a more accurate representation of the data is an important question to ask.
Another improvement was the use of the "font step" slider at the top of the screen. When I ran the algorithm with different text passages, I found that 12 was not always the optimal number for the font step. I decided it was better to let the user experiment with this. A final minor improvement, I think, is making sure that the two words with the highest frequency are always shown in black for added emphasis. Here's a screen shot of the card that builds the word cloud.
I Found Two Golden Nuggets: formattedWidth and formattedHeight
One of the wonderful outcomes of building this project is discovering the "formattedWidth" property. This property does all of the hard work of figuring out exactly how wide or how tall a text field needs to be for its contents to fit perfectly within it. I didn't know about this property when I was first building my Q Sort project, so I came up with my own function- very imperfect - to try to do the same thing. I've since updated my Q Sort app to use this property. Here are the two key lines of code that accomplish this feat:
set the width of field "word object" to the formattedWidth of field "word object"
set the height of field "word object" to the formattedHeight of field "word object"
Next Steps
As I looked at Ali Lloyd's code, it reminded me of Richard Gaskin's advice to me almost a year ago to use the "repeat for each" form of going through a list of data rather than the "repeat with" approach that I have become so fond of. My code that computes the frequency of the words in a passage of text is painfully slow, so I'm now very motivated to get with it and try out the "repeat with" approach. So, look for at least one more update on this little project.
As always, the bottom-line for me is that I continue to learn new things every time I build a LiveCode project, however small. But, is there really any other way?
Script on the Button "Build Word Cloud":
on mouseUp
//Erase any existing word cloud first
put the number of fields into L
repeat with i = 5 to L-1
put i-4 into j
put item 1 of line j of field "word frequencies" into varFieldName
put varFieldName into message
delete field varFieldName
end repeat
put false into varColor
if the hilite of button "color" is true then put true into varColor
put the thumbposition of scrollbar "fontdifferencebar" into varFontChangeAmount
//Build the word cloud
set the movespeed to 0
//Determine the largest frequency - this will get the largest font size in the word cloud
put item 2 of line 1 of field "word frequencies" into varMaxFrequency
put 0 into varFontDifference
put field "minimum frequency" into varMinFrequency
//This is the repeat loop that will create each word, resize its text size, then move it
repeat with i = 1 to the number of lines in field "word frequencies"
put random (255) into rColor
put random (255) into gColor
put random (255) into bColor
if item 2 of line i of field "word frequencies" < varMaxFrequency then
add 1 to varFontDifference
put item 2 of line i of field "word frequencies" into varMaxFrequency
end if
if item 2 of line i of field "word frequencies" < varMinFrequency then exit repeat
//The next two lines determine the area of the screen where word cloud will be built
put random(300)+100 into x
put random(300)+100 into y
//Create the next word for the word cloud
copy field "word object" on card "library" to this card
hide it
if varColor is true then
set the foregroundcolor of it to rColor,gColor,bColor
else
set the foregroundcolor of it to black
end if
if varFontDifference<2 then set the foregroundcolor of it to black
put item 1 of line i of field "word frequencies" into field "word object"
//Determine font height for the word
//put varMaxFrequency - item 2 of line i of field "word frequencies" into varFontDifference
put 96-(varFontDifference*varFontChangeAmount) into varTextSize
if varTextSize < 12 then put 12 into varTextSize
set the textSize of field "word object" to varTextSize
set the width of field "word object" to the formattedWidth of field "word object"
set the height of field "word object" to the formattedHeight of field "word object"
//Rename the newly copied field as the word it contains
set name of field "word object" to item 1 of line i of field "word frequencies"
//Move the word to a random spot within the word cloud screen area
move it to x,y in 1 millisecond
show it
end repeat
end mouseUp
Postscript: About My Formatted Code
Ali Lloyd's post reminded me that the way I've been showing code in my blog posts has been really terrible, so I did the obvious thing and googled "showing code in a blog post using blogger" and quickly found a great tool:
Hi Lloyd, I enjoyed this post -- it's very informative, and it's great to help follow your thought processes.
ReplyDeleteWith regard to formatting your code, you may find it easier to read on the blog if you include some more vertical whitespace. For example, if you look at this script in the LiveCode source code, you can see that blank lines and formatted comments are used to help divide the code up into logical sections.
Also, I think Ali has some JavaScript that can be used to help format LiveCode script on a web page with syntax highlighting. I'll see if I can find out from him!
Thanks for your comment, Peter. I was glad to find the codeformatter tool, but I'd definitely be interested in better approaches that improve the formatting of the LiveCode script in my blog.
ReplyDelete