Digital Humanities & Social Sciences Workshop

Textual Digital Humanities and Social Sciences: Data > Interpretation > Understanding

Monday 21st – Tuesday 22nd September 2015

Sir Duncan Rice Library, University of Aberdeen

Workshop Chair: Dr Adam Wyner, Computing Science, University of Aberdeen

Table of Contents
Note: if you use the internal links below, you must scroll up two lines to see the section

Motivation and Topics

In the United Kingdom and in Europe, there are large corpora of historical and contemporary texts that have been digitised and rendered machine-processable, allowing the texts to be enriched with meta-data, made available over the Internet, and linked.  Once texts are rendered machine-readable, a variety of approaches can be applied to them, ranging from Natural Language Processing tools, Linguistic Open Linked Data, the Text Encoding Initiative, and similar.  The result of such applications are richly augmented and articulated corpora.  Infrastructure has been put in place to facilitate access and reuse.  Such corpora can be used in a variety of ways – presentation in web-pages, query, transmission, mash-up, amongst others – allowing researchers to explore, interpret, and understand the content, addressing issues and questions in the humanities and social sciences.

Yet, the abundance of particular information that the distributed digital corpora contain seems to be under-explored and under-utilised.  In particular, it is unclear how researchers in digital humanities and social science use the available enriched, linked corpora to address research issues and questions.  What questions can researchers query of the resources?  Do the results that are received address the questions?  How are researchers’ issues and questions served (or not) by current augmented and articulated corpora?  How ought the sorts of meta-data change to reflect better researchers’ interests and requirements?

The Textual Digital Humanities and Social Sciences September Workshop aims to highlight recent work on textual digital humanities and social sciences while providing a networking opportunity for colleagues in Scotland, the United Kingdom, and Europe.

Return to Table of Contents

Place and Time

Held at the iconic Sir Duncan Rice Library, University of Aberdeen, Aberdeen.

Preliminary schedule:

Return to Table of Contents

Funding Acknowledgement

The workshop was funded by a generous grant from The dot.rural Digital Economy Hub, which itself is supported by the award made by the RCUK Digital Economy programme, award reference: EP/G066051/1. We thank dot.rural for its support.

Return to Table of Contents

Invited Speakers

Jonathan Blaney, University of London, United Kingdom

Jonathan Blaney

Bio: Jonathan Blaney is the Project Editor of the Institute of Historical Research at the University of London. Since 2007 he has been a part of the British History Online’s project to complete the digitisation of the Calendars of State Papers. He now continues to work for part of the time on BHO, as well as spending time on Connected Histories and IHR web projects. His interests include text editing and mark-up schemes such as the TEI; XML; the culture and technology of annotation; the digital representation of print media.

Abstract: This talk will discuss the Digging into Linked Parliamentary Data project, its scope and methodology, and then move on to discuss the problems that resource providers of historical materials face in determining what their users want and how their resources are used (and therefore how they can be improved to better meet those needs), as well as touching on the problems of obtaining funding in the humanities for projects which are as unglamorous as they are necessary.

Georgeta Bordea, National University of Ireland, Ireland

Bio: Georgeta Bordea is a Post Doctoral Researcher at The Insight Centre for Data Analytics in the National University of Ireland. She works in the Unit of Natural Language Processing (UNLP) and her research interests include term extraction, taxonomy construction, expert finding, ontology learning and population, and domain adaptation of NLP applications. She is one of the main contributors of Saffron, a system that gives an overview of research topics and knowledgeable people in different research areas.

Abstract: Regulatory compliance is a daunting challenge in many industries, but none more so than the financial industry, that is undergoing major regulatory change across the globe in the aftermath of the 2008 Financial Crisis. Semi-automated solutions based on semantic technologies are called for to assist subject matter experts in identifying, classifying and making sense of these regulatory changes. In this talk we will give an overview of a complete pipeline for automated compliance verification and describe in more detail a recent approach for semantic annotation of regulations based on multi-label classification.

Jelle Haemers, University of Leuven, Belgium

Bio: Trained as an urban historian, Prof. Dr. Jelle Haemers wrote his first book on the Ghent revolt of 1499-53. In recent years his research interests have widened to encompass other kinds of social and political conflicts in the late medieval town, notably in the Low Countries (1100-1600). He also published on the use of social theory and auxiliary sciences in history, the late medieval nobility and the financial history of court and towns. He has just completed his second book, on the political conflict between the Flemish cities and Maximilian of Austria in the 1480s (For the Common Good. State Power and Urban Revolts in the Reign of Mary of Burgundy, 1477-1482), which was awarded with the prestigious ‘Frans van Cauwelaert-price’ of the Royal Academy of Arts and Sciences of Belgium. He is a member of the Young Academy of Belgium.

Abstract: In this talk I will focus on the joint collaboration between the university and the city archives of Leuven. Both partners invest a lot of money, time and energy in the Itinera Nova project, which publishes late medieval juridical sources. Though the project is not just about the digitalization of ancient texts, it also succeeded in constructing a huge network of volunteers and professional historians with the aim of studying the late medieval and early modern history of Leuven. Some good practices will be dealt with, though there are also some ‘threats’ for the good continuation of the project.

David Milward, Linguamatics, United Kingdom


Bio: David Milward is chief technology officer (CTO) at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years experience of product development, consultancy and research in natural language processing (NLP). After receiving a PhD from the University of Cambridge, he was a researcher and lecturer at the University of Edinburgh. He has published in the areas of information extraction, spoken dialogue, parsing, syntax and Linguamatics.

Abstract: The last ten years have seen substantial use of text mining to exploit information from scientific literature. More recently, medics and health researchers have been keen to achieve “meaningful use” of huge numbers of electronic health records. This has been partly achieved by hand-coding the text, partly by text mining. This talk will review some of the use cases and specific challenges for text mining in the pharmaceutical and healthcare sectors, describe an agile text mining approach, and discuss possible lessons for exploitation of digital corpora in the humanities.

Ioannis Panagis, University of Copenhagen, Denmark

Bio: Ioannis Panagis is a data specialist at the Centre of Excellence for International Courts (iCourts), Faculty of Law, University of Copenhagen. He has been working on a project at iCourts, concerning the use of corpora to answer research questions involving international courts.

Abstract: Researchers in international courts are using case law corpora in an attempt to understand, inter alia, how courts legitimize their existence, what are the key concepts developed in the case-law, how the concepts evolve through time, what is the network of citations to precedent, in which contexts do certain set-phrases appear, and so on. In this talk we present a few examples of how, we at iCourts are using corpora as a complementary method and as the basis to answer the questions raised above. We will present different methods that we use as well as the challenges that we meet along the way.

Pip Willcox, University of Oxford, United Kingdom

Bio: Pip Willcox is the Co-ordinator of the Centre for Digital Scholarship at the Bodleian Libraries, University of Oxford, a role partly funded by the University of Oxford e-Research Centre. She is an associate member of SOCIAM: The Theory and Practice of Social Machines as well as being a Co-Investigator of Early English Print in the HathiTrust, a Linked Semantics Worksets prototyping project. She is a Co-Director of the annual Digital Humanities at Oxford Summer School, the Programme Chair of the TEI 2015 Conference and serves on the TEI Board of Directors, on the steering committee of Cultures of Knowledge, the Advisory Board of Digital Renaissance Editions and on the Institute for Historical Research’s Library Committee. Her background is in scholarly editing, and her current research interests include the interplay between the analogue and the digital, materiality, and citizen scholarship.

Abstract: Digitized text has revolutionized early modern studies, in textual presentation, resource discovery, transmission, and reuse. Since 2000, Early English Books Online Text Creation Partnership (EEBO-TCP) has been producing hand-transcribed, TEI-encoded digital editions of each unique title printed in English from 1473—1700. January 2015 saw the public release of the first 25,000 of these texts. Using EEBO-TCP as a case study, this presentation outlines the analogue and digital contexts of the texts’ and metadata’s provenance, their means of production and reception, and the implications these have for extracting, collating, and enhancing them for use at scale. It suggests some features of curation, access, and enhancement that could usefully be considered as the texts look to a more social and open future engaging with communities of interest.

Martin Wynne, University of Oxford, United Kingdom

Picture of Martin Wynne

Bio: Martin Wynne is Senior Research Support Officer at the University of Oxford. Martin is based in IT Services, where he is responsible for the Oxford Text Archive, which also involves managing the distribution of the British National Corpus (BNC). He currently has roles in the Oxford e-Research Centre, where he works as part of the Digital Humanities at Oxford initiative, and TORCH, where he works on digital strategies. Martin is also a member of the Faculty of Linguistics, Philology and Phonetics. Martin is a Director of CLARIN, which is building a pan-European research infrastructure for research with language resources in the Humanities. Martin’s current research and teaching focus on corpus linguistics, and developing infrastructure to support the use of language corpora and large data collections.

Abstract: The good news is that we can see the possibilities of far-reaching transformations in humanities scholarship to be brought about by the data deluge, faster and better computing infrastructure and the steady spread of digital tools and techniques. The not-so-good news is that it isn’t happening in as widespread and effective way as we might have hoped. What is being done to overcome the current barriers, and what more do we need to do? In particular, I’ll examine how the CLARIN research infrastructure is trying to address the problems in respect of linguistic data and software, and the ongoing difficulties of promoting and supporting digital research in the humanities. And is it now time to ask, do humanities scholars really want to scale up their research projects, work in interdisciplinary spaces, embrace new technologies, and take on grand challenges and big questions?

Return to Table of Contents


Please register for the workshop using Eventbrite. Registration is free.

Registration for Textual Digital Humanities and Social Sciences Workshop

Return to Table of Contents

PostGrad Bursary Application

To help defray costs, we have bursaries of £150.00 available to Postgraduates who wish to attend and participate in the workshop. Please apply on the form:

Form for PostGrad Bursary Applications (Google Form)

Return to Table of Contents

Expense Claims

To claim expenses, download, fill in, and sign the Expense Form. All the original receipts must be returned to Jennifer Dick by post to:

Jennifer Dick
Room 824, MacRobert Building
University of Aberdeen
Kings College
Aberdeen, AB24 3UA, United Kingdom

Jennifer will code the expenses and forward your payment.

I recommend that you copy all the receipts for your own records. Should you have a question or problem contact me or Jennifer Dick – TDHSS – Expense Claim.

Return to Table of Contents


The twitter hashtag is: #TDHSS

Return to Table of Contents

Shared Folder (Google Docs)

There is a shared Google Folder available to workshop participants for comment and files. You can upload files, create files, edit existing files…. When you follow the link, you can read the document. If you want to edit the document or upload files, click on the “Open in Drive” link on the document page; this takes you to Google Drive, where there are a range of other functions. In the Slides Folder, you can upload/download presentation slides.

Shared Google Drive folder

Return to Table of Contents


To Aberdeen by air, rail and road

Aberdeen’s international airport is served by a number of major carriers, providing an extensive network of routes throughout the UK, direct to Europe and worldwide through major hubs. British and Irish destinations include four London airports (Heathrow, Gatwick, London City and Luton – just over an hour’s flight time), Belfast, Birmingham, Bristol, Dublin, Durham Tees Valley, Exeter, Humberside, Leeds Bradford, Manchester, Newcastle, Norwich, East Midlands, Southampton, as well as the Scottish Highlands and Islands.European mainland destinations include Amsterdam, Copenhagen, Frankfurt, Oslo and Paris. There is a frequent bus link or taxis available to take you to the city centre. For more information on routes visit the Aberdeen Airport website.

Rail services connect Aberdeen both north and south. There are regular direct trains to London, and services from Edinburgh and Glasgow link with other mainline routes. Aberdeen is served by the Caledonian Sleeper as well as daytime trains. For timetable information visit National Rail Enquiries.

Aberdeen is served by coaches from Edinburgh, Glasgow and other destinations, coach travel from the rest of the UK involves connecting through Edinburgh or Glasgow. Details are available from Citylink (covers all operators in Scotland) and individual operators websites.

Locally the University is served by First Bus (routes 1,2, 19, 20, x40 amongst others) and Stagecoach busses from Union Street and Union Square bus station in the City Centre.

Within Aberdeen – Campus, City Centre, Restaurant

Bus Information from City Centre to University of Aberdeen Rice Library

Dinner Reservation at Zizzi’s and Travel information

Return to Table of Contents


There is no specific workshop hotel, allowing participants to select their own. It is recommended to book accommodation as soon as possible as there is high demand for hotel rooms in Aberdeen.

Options easily accessible from the University include:

Return to Table of Contents


Return to Table of Contents

Comments are closed, but trackbacks and pingbacks are open.