Publishing and Disruptive technology

The second session of LAPIS was very informative, we discussed Printing/Reading/Books: From the knowledge to the sharing economy. However, I will focus on what I was most drawn to during the lecture.

What is publishing now? Years ago book publication was solely the responsibility of publishing houses. Where the manuscript goes through a process such as proof reading and editing etc. the typeset and the cover design was selected and then it was printed and distributed to the various book stores around the world. The ‘then’ process may now be thought of, as a slow one when compared to the way in which distribution/dissemination of information is now done, thanks to ‘Disruptive technology/innovation.’

The term disruptive technology was coined by Clayton Christensen to describe /explain what happens when one new form of technology displaces an established technology and shakes up the industry or a ground-breaking product that creates a completely new industry.

Hence, according to John Feathers (2006) Publishing is a commercial activity of putting books into the public domain. Publishers decide what to publish and then cause it to be produced in a commercially viable form e.g. e-books, the product is then advertized and sold through a network of wholesalers and retailers.

Further, we discussed Marshall McLuchan’s quote that is quite relevant in today’s society ‘the medium is the message’ that appeared in his book ‘Understanding Media: the Extensions of Man’ which was published in 1946. McLuchan saw the medium as having a symbiotic relationship with its audiences since it was established that the medium influences how the message is perceived.

Clearly publishing has undergone many fundamental changes over the years, and as my classmate Dominic

states, ‘the publishing sector has severely been disrupted by the emergence and growth of internet-based technologies – notable examples include the development of self-publishing as a viable business model, the development of e-readers and the replacement of traditional encyclopedias and reference sources with free, sourced alternatives such as Wikipedia.’ …

Until next week!

Data and Text Mining

What is Data and Text mining?  Data mining is a class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events.  data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed. The key properties of Data Mining are Automatic discovery of patterns, Prediction of likely outcomes, Creation of actionable information, and it Focuses on large data sets and databases. Text mining on the other hand is the analysis of data contained in natural language text. Text mining works  by transposing words and phrases in unstructured data into numerical values which can then be linked with structured data in a database and analyzed with traditional data mining techniques.

The difference between regular data mining and text mining  is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts. Databases are designed for programs to process automatically; text is written for people to read. We do not have programs that can “read” text and will not have such for the forseeable future. Many researchers think it will require a full simulation of how the mind works before we can write programs that read the way people do.

One of our tasks in DITA labs was to explore the Old Bailey Online site (the API demonstrator) and the Utrecht University Digital Humanities Lab to compare the way in which each used data mining.

I explored the Utrecht University Digital Humanities Lab and was particularly interested in the Dynamics of the medieval manuscript: Text collection from a European Perspective. Upon observation I noted that the site just gives a synopsis of the project and there is no hyperlink to text analysis tools.  See screen shot below:

Digital Humanities Lab

Unlike Old Bailey Online site which allows the direct export of data to Voyant and you can chose the amount of search results that you would like to export ranging from 10, 50 and 100. See example below:

Old Bailey Online

One of the data sets that were exported via voyant tools created data visualization shown below:


Experimenting with the text visualization tools was very interesting!

“Screwing around” with Datasets (Text Analysis)

I know that the ‘screwing around’ got you thinking, what the hell is he/she talking about right? Well contrary to your little dirty minds, ‘screwing around’ used in this context means to be browsing; you know when you not too sure of what you are really searching for, as Stephen Ramsey 2014  nicely explain.

What is text analysis? Text analysis tools break a text down into smaller units like words, sentences and passages and then gather these units into new views on the text that aide interpretation. Geoffrey Rockwell

gives a brief history of electronic texts and text analysis according to Rockwell a good way to understand text analysis is to look at the tradition of concordancing from which it evolved. A concordance is a standard study tool where one can look up a word and find references to all the passages in the target work where that word occurs. They are alphabetically-sorted lists of the vocabulary of a text (its different words or phrases). Occurrences of each word (the keyword) appear under a headword, each one surrounded by enough context to make out the meaning, and each one identified by a citation to the text that gives its location in the original.

In DITA lab session #Citylis students had a wonderful experience of screwing around with Wordle, Many eyes and Voyant, these are all JavaScript tools that aids text analysis. The datasets that we were urged to save in previous lab sessions were used to experiment in Wordle etc. Wordle generates word clouds from the datasets that were created. Word clouds add flare to your text by allowing you to change fonts, layout and colour schemes. Below is an example of a text analysis produced in Wordle: (#citylis top tweeters)


Datasets of journal articles inserted into Many Eyes for text analysis created the following:


Datasets of journal articles inserted into Voyant produced the following:


Interesting as it is working with these JavaScript tools, Jacob Harris view word clouds as harmful to journalism. For what its worth, I enjoyed screwing around with the text analysis tools!