A brief introduction to Web 3.0…

In Week 9, our attention turned towards the future of digital information and more specifically towards the Semantic Web or ‘Web 3.0.’ This vision of the way in which information should be made both human and machine readable, first appeared a lot longer ago than I could have guessed – with Wikipedia dating the beginning of the movement sometime back in the 1960’s. Considering the pace at which technology has advanced and how the exchange of information has progressed with the birth of the internet and the World Wide Web, it seems a little surprisingly that this idea has not made a lot more ground.

The main feature of this Semantic Web is that the relationships between different bits of information are recorded as bits of information themselves, thus making them ‘understandable’ to the machine. This would enable information retrieval to be far more accurate. Instead of searching key terms and filtering through the search results of those terms in very different contexts, you would be able to search for the relationships between the search terms and the search engine would be able to ‘understand’ what that means and begin the filter those results which don’t meet the specific context searched for. This shift towards a semantically organised Web, would obviously need to involve using mark-up languages in a different way and we began to try to understand how this vision of the Web may be possible.

As part of the lab session we explored a website called Artists Books Online, “an online repository of facsimiles, metadata and criticism.”


This site adopts a DTD (document type definition) which acts as a template to ensure that the correct information for each artefact is present. This largely follows a format which reflects traditional bibliography, requiring information which describes the books, it’s author, publisher, place of publication etc. This style lends itself quite nicely for us to be able to begin to visualise how the Semantic Web might work, by looking at the given information for a chosen item and seeing what tags could be added to the data in the site to make the relationships between the information more machine understandable. As Ernesto (via Joanna Drucker) details in our lab notes,

“Linked Data” or “data linkage” on the Web is mainly achieved through a standard known as the Resource Description Framework or RDF…There are a number of different technical ways to express RDF (for example XML), but the basic concept is that things are described through “triples” which take the form of Subject – Predicate – Object sentences.”

Using the information from this site, we were able to create our own ‘triples,’ e.g.

Subject           “Monuments to the Industrial Revolution”

Predicate      was authored by

Object            Charles Agel


Old Bailey Online Revisited

Our 8th DITA lecture covered data mining and included our first guest speaker Ulrich Tiedau, who works on a research project focusing on digital humanities and reference cultures using data mining in order to carry out this research. It was a useful and interesting insight into how data mining is being used by academics and librarians for research purposes.

The aims of our lab session were to explore further and compare the Old Bailey Online resource and one of the research data mining research projects hosted by Utrecht University, of which Ulrich’s was one.

We’d previously used Old Bailey Online when learning about API’s, so I was already familiar with the way it works and fortunately it is a user friendly site. It’s an incredibly interesting one at that and too much of a temptation for avid procrastinators like myself. To start off I began trying out a few searches to refamiliarise myself with the search function. It very helpfully has a dropdown menu with suggestions for the subject lines of offence, verdict and sentence. I opted to search by offence, selecting ‘concealment of a birth,’ which produced 543 results. In order to export this data I had to recreate the search via the API demonstrator on another part of the site, where the search function is slightly different, for example it offers the additional option of being able to choose the gender of the victim. I carried out the same search but also filtered them to include only guilty verdicts and the results was 365 hits.

old bailey api results

The API demonstrator links to the data mining tool Voyant (as well as Zotero), which we explored in the lecture the week before, allowing you to transfer the data easily through the click of a button, making the process far less labour-inducive. It also includes a “More Like This function that allows you to build new searches based on a Text Frequency – Inverse Document Frequency (TF-IDF) methodolology” which further assists in a text analysis you may wish to carry out using the sites content.

In the lab session itself, I wasn’t able to use the Voyant function as it was being used by upwards of 30 people at once, but on attempting again at home I was able to successfully transfer the first 100 documents, which is the greatest amount the function allows.

voyant old bailey

I explored the functionality of Voyant a little more than last time to see what content I could extract which may be useful or meaningful. Unfortunately, some of the searches weren’t possible because there were too many documents, for example I wasn’t able to view where instances of a particular word were across the whole corpus. It is still quite an impressive tool, however I found it a bit hard to test how useful or effective it was without a specific aim in mind.

In the second part of the lab, we were asked to compare the Old Bailey Online to on of the data-mining research projects from Utrecht university. Having just heard Ulrich speak I decided to take a the website of the research project he had spoken of. The mission statement outlines the project and its purpose:

“The program uses digital humanities tools to analyze how the United States has served as a cultural model for the Netherlands in the long twentieth century.”

The method by which they aim to carry out this analysis is through data-mining from newspapers which have been digitised, measuring long-term trends in national discourses. This method would have be near impossible without being about to data-mine information from diigitised text. The only alternative would be to read through hard-copy newspapers one-by-one picking out terms and trends one-by-one. Unfortunately, with data-mining being in its infancy, not all of the material required has been digitised and of that which has been, not all have an API available to use, free of charge or otherwise.

Compared to the Old Bailey Online, this project is of a very different nature primarily because it involves data-mining from numerous external sources and collating that information together over a long period in attempt to measure trends in national discourse. Old Bailey Online has only one source to focus on, the archives of the London central criminal court and aims to digitise and make the data available for text-mining, serving more as a source for research projects rather than as one itself.