I thought that I would share a bit about my favorite mathematical method: Markov Generation. I am not going to share any Markov generation scripts. Anyone who is willing to put a bit of time into learning PHP is more than capable of writing there own Markov text generation script. What you do with your new found ability is up to you. "Give a man a fish, feed him for a day. Teach a man to fish, and feed him for the rest of his life." Markov Generation involves an actuarial analysis of a given corpus of text. In laymen's terms, if I look at the complete works of William Shakespeare what word, or segments of characters, most often follows say "thou"? With that knowledge, we can then tell this handy bit of plastic and silicon to randomly choose what word to place next. Let us just say for instance that "shalt" is statistically chosen. We now have "thou shalt". This process continues and out pops our generated text. The nuts and bolts. The first step is to create an index. We need some source of text: generally the larger the corpus is the more sensible the result. It is probably a good idea at this point to send the text through a series of preg_replace functions and clean out all of the line breaks, etc. We then split the text into an array of strings. We now have a neat little array of the cleaned up text. The index is just a table listing the regularity with which one segment of text follows the previous segment, or numerous previous segments. We simply use a loop to crawl through our array and increment each time a certain segment follows. For example: in our previous example "thou" could be followed by "shalt", "will", "shan't", etc. Let us assume we end up with something like thou -> will(6), shalt(9), shan't(3). This tells us that 1/2 the time "shalt" follows "thou". Now you have an index of the text. A Random Walk Now that we have our index, how do we generate text from it? This really depends on how your script will function. If your generating individual sentences and then rolling them out as a paragraph, then you probably made an index of First Words in the previous section by splitting the string by sentence. If this is the case you simply start with a random first word, we will call it the token, search the array for that entry, and choose the following word randomly based on frequency. The following word then becomes the new token and we look that up in the index and loop through the index letting the script randomly choose, based on frequency, the next word until we have our generated text. If you are generating the whole chunk of text without care for individual sentences, or using chunks of characters, then you will probably have an index that includes words, or characters, that have capitalization and periods tagged on the end. In this case, we can either clean out the partial sentence at the beginning after we generate it with some regex; or search the index for a capitalized word for our first token. The same process applies: we choose a token, randomly choose then next word based on frequency, then set the new word as our token and restart the loop. Congratulations - you now have an extremely ugly hardly readable chunk of generated text that sound somewhat similar to the text you put into the script. Water into Wine This is the most important and most often under-appreciated step. The chunk of text we have isn't terrible, but it's ugly. It probably isn't very presentable. We need to throw a suit on it, teach it some public speaking, and get it out into the world. We need to run our script a few dozen times and take some notes. Write down anything that you don't like or want to appear in the finished product. We then go back to our script and we add some regex or other methods of fixing the errors or problems we noticed when we took notes. We might want to delete repetitions of the same or similar text. We may need to add or remove some spaces to our corpus before we split it into our index. If you are really a glutton for punishment you can run it through a list of commonly misspelled words and correct them. How about adding a system so that tokens we choose as keywords have a higher frequency? Maybe, we create a massive database index of millions of chunks of text and change the script to start in the middle of the sentence with our keyword and then loop backward to the first word and forward to the first period. We might generate millions of sentences and then have the script organize them by keyword phrase. What we will definitely do is crack a bottle of champagne and toast our new found ability to create snake oil.