Some elementary text processing with unix:

I downloaded the archives from the usenet users group comp.ai.alfie for 94-09-20. There were 65 individual files that I concatenated with the command:
          cat * bigfile

tr command (translate)

tr 'aeiou' 'x' <bigfile >xfile will create a file called xfile where all occurences of the lower case letters 'a','e','i','o','u' will be turned into the letter 'x'.
tr -c 'aeiou' 'x' <bigfile >yfile will create a file where all occurences of any bytes unequal to the letters 'a','e','i','o','u' will become an 'x'.

tr -c 'A-Za-z' '\012' <bigfile >zfile will create a file called zfile where everything other than a letter will be converted to a new-line character. This will cause multiple new-line characters. To get a file with a new word on a single line, type

tr -cs 'A-Za-z' '\012' <bigfile >wfile

Let's convert all upper case to lower case:

tr 'A-Z' 'a-z' <wfile >lowerfile

Now we sort the file, pipe it to uniq with a count and sort again in descending order:

sort lowerfile|uniq -c|sort -nr >countlist

The file countlist will contain a word count of the original file in descending order