Saturday, April 13, 2013

Creating Unique Random Codes

Did you ever look at a YouTube video link and wonder where that odd set of characters at the end comes from? Here's an example (the video is one that yours truly made):

https://www.youtube.com/watch?v=xSnnrfNO-o0

I've always assumed labels like "xSnnrfNO-o0" are just a string of 11 random letters, numbers, and characters. Lots of sites use the technique of generating random strings like this for links, passwords, and special codes. (Tinyurl.com is another good example.) Now, it is one thing to create a random string, but it is quite another to make sure that every random string is completely unique from all others. Obviously, no YouTube video can have the same exact string, otherwise there would be chaos, havoc, and no way for grown people to watch their favorite puppy or kitten video when the mood strikes.

So, I set out to create a LiveCode project that created a list of unique, random strings of letters, numbers, and characters. For example, consider the number of unique two character codes that one could generate from the following: a, b, and c.

We can easily figure out that there must be 9 such codes:

aa
ab
ac
ba
bb
bc
ca
cb
cc

But, what if I wanted to create unique three character codes (e.g. abc, aab) or four character codes (e.g. aaab, aaba). How many would there potentially be? So, let's consider the math. This is an example of a permutation. There is a simple formula to answer this question:

nr
(This is read as "n to the power of r.")

Where n is the number of things to choose from, and you choose r of them (also note that repetition is allowed and order matters).

If you want to know more than that about the math, here's a good Web site that will explain both combinations and permutations:

http://www.mathsisfun.com/combinatorics/combinations-permutations.html

So, in the example above, n is 3 (abc) and r is 2 (codes with two characters, or "slots"). 32 is 3 times 3, or 9.

If we wanted to generate 3 character codes, it would be 33, or 3 times 3 times 3, or 27.

In YouTube's case, r would be 11 because there are 11 characters -- let's call them "slots" from here on out -- in the code. But from what group of characters is YouTube using? Well, from the one example above, it seems to consist of lower case letters, upper case letters, numbers, and special characters (e.g. hyphens and the like). If we just stick to letters and numbers, the list would contain 62 different characters (26 lower case letters, 26 upper case letters, and the 10 digits). How many different codes with 11 slots could we construct from this list?

It would be 6211, or -- in scientific notation -- 5.2036561e+19, or...

52,036,561,000,000,000,000 unique codes. In short, a lot.

Now, allow me to repeat myself and say that it is one thing to generate a single code, but it quite another to make sure that the next code is guaranteed to be different from all others generated so far.

So, how does YouTube or any other site do this?

I can quickly see three ways of doing this:

1. Given a list of letters, numbers, or characters, select one of these at random for each slot. As each code is created, add it to an ever growing list, then every time you create a new code, first compare it to all of the codes in this list. If it matches any of them, throw it out and start over. If no matches are found, give out the code and then add the code to the list.

2. First generate the list of all possible codes (i.e. permutations), then begin assigning the codes one at a time. There will be no need to check against the list because you made sure that all codes on the list are unique before you start handing out the codes.

3. Only pretend that the codes are randomly generated. After all, who's going to check? Use some secret coding system where 1 is an a, 2 is a b, # is a z, and so on. Then, you can just start a counter at 10000000001 and generate codes sequentially. They may look random, but anyone who knows the "secret system" could work backwards to figure out what code will come next.

If you see other ways, share them with me.

I'm going to immediately dismiss #3, mainly because there are plenty of people in this world with loads of time on their hands who will likely figure out what's going on. Of choices #1 and #2, #2 seems the best strategy because of the simple fact that it would take the least amount of time to implement. Let some computer take the weekend to generate the first couple billion of the codes and put them in a list, then start pulling the codes from that list one by one. The only downside is that you have to store a very big list of codes somewhere and keep that list safe. And, it's hard to tell just how big to make that list. If you create the next YouTube, you better have a long list. But, if you create Web sites as popular as mine, 10 or 20 codes will likely do the trick.

Option #1 looks attractive at first, but stop and think about it for a second with a slight air of confidence. After a little while, if the site you've built is modestly successful, the list of codes could easily grow to thousands or millions. (How many YouTube videos have been uploaded in the time it takes you to read this?) Every time a new code is generated, the computer has to compare that code first to *every single previous code* in the list before giving it out. That will take some serious processing.

So, ponder that for awhile while we take the first step that applies to both options #1 and #2, namely that we have to learn how to generate a list of unique codes. So, let's start with that. First, here is a screen shot of my LiveCode program:

Here are the directions of how to use the program:

1. The "code elements" are the things that make up the code. Scroll through the list and you'll see that I'm using all lower case letters, all upper case letters, and all 10 digits, for a total of 62 characters. Notice that each element needs to be on its own line. This field is editable, so feel free to change it as you wish, but just make sure that only one element is on each line.

2. Enter the number of code slots. That is, how many characters will make a code. The code "aa" contains 2 slots, whereas "ad4" has 3 slots, and "gK85ty9" has 7 slots. Enter any number you wish (but I suggest keeping it relatively small -- under 10).

3. Press the button "Create Random Codes" and watch the magic.

In the above example, I'm generating codes with only 2 slots, so 62 raised to the power of 2 equals 3844 total possible unique codes. As you can see, after generating 501 unique codes, a total of 36 duplicates were found along the way. But don't worry, these were deleted as soon as they were created. The field labeled "Codes Created" stores all of the unique codes. So, if you were to scroll through the list above, you would find 501 unique codes.

4. Click "Stop!" at any time to tell LiveCode to stop at that point. (Depending on the number of slots and the total number of elements, it could take a very long time to generate all the unique codes.)

I find it fascinating to see how fast or slow the unique codes and the duplicates are generated. Not surprisingly, early on there are few duplicates, but this dynamic certainly changes as the calculation engine cranks out a bunch of codes. (It would be cool to graph this dynamic.)

How did I program this in LiveCode? Let's just focus on the basics. (Most of the scripting, as is typical, is actually for the interface design. So, I'll deliberately ignore that stuff for now knowing that you can explore the raw code on your own. But, I'm happy to explain anything further, so just ask.)

First, we need a text field to store the group of characters from which we will build the codes. I'll name this field "code elements." Each line consists of a single element (e.g. "a" on line 1, "b" on line 2, etc.)

We will create a variable -- varTotalCodeElements -- that will contain the total number of lines with an element in it:

put number of lines in field "code elements" into varTotalCodeElements

We can then use this script to select one of the elements at random and put it into the variable "varA":

put line random(varTotalCodeElements) of field "code elements" into varA.

To build the code -- let's use the variable "varCode" to store it -- we need to create a repeating loop that repeats as many times as slots in the code. So, if we want a code with five slots (e.g. abcde), the code would look like this:

   repeat 5 times
      put line random(varTotalCodeElements) of field "code elements" into varA
      put varCode&varA into varCode
   end repeat


Recall that the ampersand (&) concatenates, or joins, string elements. So, varCode builds by one character each time the loop repeats.

Of course, we should build this so that our program is flexible enough to build codes that contains any number of slots. So, let's use the variable "varTotalCodeSlots" to store that number:

   repeat varTotalCodeSlots times
      put line random(varTotalCodeElements) of field "code elements" into varA
      put varCode&varA into varCode
   end repeat 


It's important to note at this point that we also have all the information we need to determine the total number of possible permutations, that is, the total number of unique codes possible. Here's the script:

   put varTotalCodeElements^varTotalCodeSlots into varTotalCodesPossible

The symbol ^ is used to represent "to the power of." So, the value of the variable "varTotalCodesPossible" will contain the value of varTotalCodeElements raised to the power of varTotalCodeSlots. (If you download and run the final program, you will see where I use it.)

We will be adding all unique codes generated to an ever growing list of codes. So, let's get that part ready by creating another field to hold these codes. We'll simply call that field "codes." So, to add a code to this list, we first have to determine how many codes are already in this list, then we add the most current code generated to the very next line. These two lines of script handle this:

    put number of lines in field "codes" into x
    put varCode into line x+1 of field "codes"

(Side note: the variable x is a local variable, so I usually don't bother with long naming conventions given that local variables are only temporary, unlike global variables.)

Of course, before we add a code to this list, we have to first determine if the code is already contained somewhere on the list. How is that done? Answer: by brute force. That is, we will generate a random code, then compare it to each and every code already in the field "codes." A tedious task that the computer loves to do. Here's the script:

   put number of lines in field "codes" into x
   repeat with i = 1 to x
     put line i of field "codes" into y
      if varCode = y then
         //Sorry, it's already been taken, so stop, and try again.
         exit repeat

        //go and generate another random code and start the repeat loop again
      end if
   end repeat


    //Yippee! This one hasn't been used yet! Add it to the list.
    put varCode into line x+1 of field "codes"
 (Note: y is another local variable.)

The construction of the repeat statement in line 2 basically says "begin a repeat loop that puts the value 1 into a local variable i; keep repeating the loop, adding 1 to i each time; stop repeating when you reach the value of x."

The local variable y contains the code stored the current line of field "codes" being examined (i.e. when i is 1, line 1 is being examined; when i is 9, line 9 is being examined). The computer carefully compares each varCode to y and determines if it is a match. If it is, then the repeat loop is aborted with the "exit repeat" command. But, if the computer goes through each line of field "codes" and does not find a match, it concludes it is a unique code and it is added to the list.

Those are the basics of how options #1 and #2 work. The difference is that option #1 does this comparison "on the fly," whereas option #2 grinds out the entire list ahead of time. In a sense, the LiveCode example I created does both. Again, I added lots of little extras to make the program more functional and understandable. (A reminder that the default set of characters I use in the program consists of 26 lower case letters, 26 upper case letters, and the 10 digits -- a total of 62 characters in all).

Try it out and see what you get.

A final note...

I thought the program was working fine, but then I did a test of all 62 characters with a slot number of 1. In other words, I was asking the computer to find just the 62 characters. Every time I ran the program, it quickly found 36 characters, but then just seemed to spin its wheels. It kept chugging along, trying to find a unique code, but would never go beyond 36. I began to worry that my logic had some fatal flaw in it that was only now revealing itself with this simple test. However, when I used only the numbers 0 to 9, there was no problem. Likewise, there was no problem when I went back and redid my initial tests using just the letters a, b, and c. And then it hit me: 26 letters in the alphabet plus the 10 digits also equals 36. This indicated to me that the computer was not distinguishing between upper and lower case letters. A quick search in the LiveCode dictionary on the keyword "case" revealed the property "caseSensitive," which "specifies whether comparisons treat uppercase and lowercase letters as different." It also stated that "by default, the caseSensitive property is set to false."

So, I added the line "set the caseSensitive to true" as the very first command in the procedure and all seems to work just fine.

The Free Open Source LiveCode Community Edition Just Released

RunRev just released their free, open source version of LiveCode, called LiveCode Community Edition. Here's a link with all of the information:

LiveCode Community Edition

This is not a demo version, it is the real deal. You get a fully functioning version of LiveCode. There are a few catches. Here are the two that jumped out at me:

  • All of the programs you create must be open source when you distribute them.
  • You cannot publish apps to Apple's App Store.
Not being able to publish apps to Apple's App Store is the most significant limitation for me. One must purchase the commercial license ($500 annual subscription) to do that. However, you can apparently program your app completely in the free version, then purchase the commercial license when you are ready to submit to Apple.

All in all, I think this is fantastic, especially for educators like myself.

And, if you didn't have LiveCode already, you can download it for free right now and begin exploring the scripts to all the stuff I'm sharing in this blog. So, what are you waiting for?

Thursday, April 4, 2013

Crack the Code!

I have designed a little game based on the encryption scripts I have discussed in my recent postings. I'm calling it "Crack the Code."

The goal of the game is to identify an "encrypted" word as fast as you can without making any mistakes. If you make even one mistake, the game is over. The player's score is based on the time it takes to identify the word in the group, using a countdown timer starting at 20. So, if you identify the word immediately, you get 20 points. If it takes you 15 seconds to determine the word, your score is 5. The game is also over if the timer reaches 0.

The encrypted word is among a group of five chosen at random from a list of over 1600 six-letter words. I'm using a list I found at the following web site:

http://www.easysurf.cc/list6.htm

I found this based on the following Google search: "list of words with only 6 letters." As it turns out, people have compiled lists of words with varying lengths (people with time on their hands). This one seemed as good as any, though I have found myself having to remove some questionable words from the list. So far nothing too bad, just those sure to make any normal fifth grader laugh or smirk (e.g. rectum, nudity, dickey, tampon). However, I keep coming across words that are questionable, so I'll definitely need to print out the list at some point and review it carefully. (If you download and play the game, consider the word list appropriate for only those over 18 for now. Better yet, substitute your own word list instead.)

The game uses all of the scripts discussed in my previous posts, though of course I had to add lots more coding to create the game. There are far too many to go into any great detail here, but I will discuss a few. First, here is a screen snapshot of the game, along with the LiveCode file. I've also exported the game for both Macintosh and Windows for those of you who don't have LiveCode but who might want to play game:


I'm debating whether I'll do any more development on this little game. I think I'll share it with a few people to see if they find it enjoyable (or maybe hear from some of you reading this posting). If I get some positive vibes about it, I may be motivated to go further. One obvious enhancement would be to store the high score to give the player something to shoot for. Another obvious enhancement would be to let people update or change the word list. When I taught fifth grade, my students were confronted with learning a weekly list of spelling words, so I think this game would make a fun way to learn the words. (Do fifth graders still have to learn how to spell weekly word lists?)

OK, here are some of the most important programming notes:

First, note that the list of words are stored in a text field called "words" on a separate card (named "data"). I just copied and pasted the word list into the field.

Next, check out the script on the card "game." Yes, there is  lot of code on here, but if you scroll down to the very bottom, you will find the following:

on opencard
   hide button "New Game"
   hide field "bonus letter"
   newGame
   newWord
   computeTime
end opencard

So, when the card is first opened, this script takes over. This is actually a nice example of using the "divide and conquer" strategy for programming by creating custom procedures. And, procedures can easily be reused where ever it is appropriate to do so. For example, you will find I use the "newWord" procedure elsewhere to begin a new round within a single game.

The script to create the timer deserves some mention, as it shows a good approach to using time as a game variable. As I mentioned above, in this game you have 20 seconds to choose one of the five words. The quicker you are, the more points you get because the number of seconds remaining is your score for that round. However, the game ends if you choose the wrong word in any round. This 100% accuracy creates a a healthy game "tension," in my opinion -- you want to be quick, but you dare not make a mistake. Here is the script for "computerTime":

on computeTime
   put 20 into varTimeLimit
   if varGameActive is true then
      put the seconds into gVarTimeEnd
      put varTimeLimit-(gVarTimeEnd-gVarTimeStart) into gametime
      put "Time: "&gametime into line 1 of field "display time"
      send "computeTime" to me in 50 milliseconds
      if gametime<= 0 then TimeUp
   end if
end computeTime

Line 2 makes 20 seconds the time limit to choose a letter. If the clock gets to 0, another procedure I created -- "TimeUp" -- is triggered in line 8.

The global variable "varGameActive" is either true or false. The clock is only running when true.

Line 4 puts the current time, in seconds, into the variable "gVarTimeEnd." ("Seconds" is actually a system variable that contains this information.) There is a similar line of code with the variable gVarTimeStart that is executed near the start of the procedure "newWord." So, line 5 subtracts gVarTimeStart from gVarTimeEnd to determine the number of seconds that have transpired. However, since we want the game action to feature a time that counts down from 20, we have to subtract that number from varTimeLimit.

Now, the timer has to be running continuously as it shows the player how much time is remaining, so we need to put this procedure in a loop that never ends so long as varGameActive is true. So, notice line 7:

send "computeTime" to me in 50 milliseconds

This line probably looks a little odd, but you will see this construction often in LiveCode, so it's a good idea to get comfortable with it. It's also quite powerful. Anytime you see the word "me," LiveCode is referring to the particular object within which the line of code is located. It could be a button, a procedure, a card, etc. -- any of the LiveCode objects. This is a handy and quick way to refer to the object you are working on without having to name it explicitly. "Me" is a relative term that LiveCode understands. In this case, "me" refers to the procedure "computerTime."  50 milliseconds is obviously not a long time. Basically, this line just says "Wait just a moment, then come back to me." Why 50 milliseconds, and not say 10, 70, or 100? I'm not sure. I saw this line of code in another example on the RunRev web site, so I adopted it. (Perhaps I, or you, will play around with this at some point.)

(You might be wondering why I named these variables with a "g" in front, seemingly breaking from my naming convention. Well, I had first written this code for another game, and I was using the convention of putting the lower case g in front of all global variables. I've stopped doing that in more recent projects, but because I was copying and pasting, I just decided to leave the variable names as they were.)

Let's turn our attention to the procedure "newWord." This procedure chooses five random words are chosen from the list.  As mentioned above, the words are stored in a field called "words" on a separate card (named "data"). Here is part of the scripting convention for choosing the five random words:

   put random (the number of lines in field "words" on card "data") into x
   put line x of field "words" on card "data" into varWord1
   repeat forever
      put random (the number of lines in field "words" on card "data") into x
      put line x of field "words" on card "data" into varWord2
      if varWord2 <> varWord1 then exit repeat
   end repeat

Line 1 first counts the number of lines in field "words" and puts that number into x. As simple as this is, it is very important because you can substitute any list of words you want without worrying about how many there are. There are about 1600 in the list now, but you could pare this down to just 5 if you wanted to (of course, the same five words would then be chosen each time, making for a pretty boring game). And no, I don't know the limit as to the number of words (i.e. lines) that can hold data in any single text field (I should look that up). Line 2 puts that random word in the variable "varWord1".

Next, there are four groups of code that use the "repeat forever" command, one each for the remaining variables holding the words chosen at random (e.g. varWord2, varWord3, varWord4, varWord5). Lines 4 and 5 basically repeat lines 1 and 2, substituting varWord2 for varWord1. Line 6 is the key. It repeats the loop forever until varWord1 and varWord2 are different words. In a list of 1600 words, this shouldn't take long.

So, by the time I get to choosing varWord5, line 6 looks like this:

if varWord5 <> varWord1 and varWord5 <> varWord2 and varWord5 <> varWord3 and varWord5 <> varWord2 then exit repeat

I'm simply comparing varWord5 with the other four words already chosen with the computer repeat the loop until a unique word has been found.

You might recall from an earlier post that I shunned this strategy for choosing unique elements in a list because it might take the computer a long time if the list was very small. In an earlier post I had shown a clever strategy for making sure that a unique element would be chosen at each random number. Given that I have over 1600 words, I figured this was not necessary, plus I was wanting to try something new. So, what happens if you pare the list down from 1600 words to just, say, 6? Well, I tried doing and the performance was excellent -- no noticeable lag. As Mr. Spock would say, "Interesting."

There is lots more to write about, and I may do so in later posts, but let me end on the topic of sound effects. In short, I think they are extremely important. Because time is of the essence, the sound effects give you all the feedback you need to know if you have or have not been successful. But what if you have a hearing impairment and can't hear the sounds. The game just continues to the next word if you are successful versus providing the player with feedback if they are not. Still, it would be worthwhile to explore providing some kind of visual feedback -- perhaps a screen flicker -- if the guess is correct.

Play the game and let me know if even this primitive draft is any fun.