“Screwing around” with Datasets (Text Analysis)

I know that the ‘screwing around’ got you thinking, what the hell is he/she talking about right? Well contrary to your little dirty minds, ‘screwing around’ used in this context means to be browsing; you know when you not too sure of what you are really searching for, as Stephen Ramsey 2014  nicely explain.

What is text analysis? Text analysis tools break a text down into smaller units like words, sentences and passages and then gather these units into new views on the text that aide interpretation. Geoffrey Rockwell

gives a brief history of electronic texts and text analysis according to Rockwell a good way to understand text analysis is to look at the tradition of concordancing from which it evolved. A concordance is a standard study tool where one can look up a word and find references to all the passages in the target work where that word occurs. They are alphabetically-sorted lists of the vocabulary of a text (its different words or phrases). Occurrences of each word (the keyword) appear under a headword, each one surrounded by enough context to make out the meaning, and each one identified by a citation to the text that gives its location in the original.

In DITA lab session #Citylis students had a wonderful experience of screwing around with Wordle, Many eyes and Voyant, these are all JavaScript tools that aids text analysis. The datasets that we were urged to save in previous lab sessions were used to experiment in Wordle etc. Wordle generates word clouds from the datasets that were created. Word clouds add flare to your text by allowing you to change fonts, layout and colour schemes. Below is an example of a text analysis produced in Wordle: (#citylis top tweeters)


Datasets of journal articles inserted into Many Eyes for text analysis created the following:


Datasets of journal articles inserted into Voyant produced the following:


Interesting as it is working with these JavaScript tools, Jacob Harris view word clouds as harmful to journalism. For what its worth, I enjoyed screwing around with the text analysis tools!