GND reconciliation for OpenRefine

27 Aug 2018, Fabian Steeg, Adrian Pohl | 🏷 lobid-gnd 

Our lobid-gnd service provides access to the Integrated Authority File GND. The service contains integration into OpenRefine, a powerful tool for working with messy data. This tutorial provides an overview of GND reconciliation for OpenRefine. The features used here require OpenRefine 2.8 or later.

Reconciliation is the process of matching name strings to identifiers of entities in a database like an authority file, Wikidata etc. This is useful whenever you want to merge differing name strings for the same person in your data or when you want to fetch additional data from the target database you are reconciling against.

The first step in the reconciliation process is to create a project. OpenRefine can import data from various sources. For this tutorial, we’ll simply import data from the clipboard:

1

Copy these lines and paste them in OpenRefine:

name;beruf;ort
J. Weizenbaum;Informatiker;Berlin
Twain, Mark;Schriftsteller;
Kumar, Lalit;;
Jemand;;

2

In the following preview screen you can take over the settings which were automatically detected and create the project:

3

We now want to reconcile the text strings in the name column with GND entries:

4

We’ll have to add the GND reconciliation service:

5

Paste https://lobid.org/gnd/reconcile as the service URL:

6

Collapse the drawer on the left hand side by clicking the newly added service. As our list for reconciliation consists solely of personal names, we now select Person to reconcile only against GND entries of type Person:

7

For real-world data it can make sense to pass additional data from other columns to improve the reconciliation results (the value in the text box is arbitrary here, but must not be empty):

8

After reconciliation, we can inspect not automatically matched candidates by clicking their name:

9

This brings up a preview, with the option to match them:

10

After matching, we can enrich our data with the reconciled data. We want to add columns based on the reconciled values:

11

We can now select the properties we want to add and preview them. Here, we choose Beruf oder Beschäftigung, Geburtsort, Sterbeort, and Ländercode:

12

The first three properties are GND entries themselves, so they are recognized as reconciled items (they are links in the preview).

For non-reconciled items that have a label and an ID in lobid-gnd (such as Ländercode), we can configure the content we want (label or ID) using the configure link for that property:

13

Note also the limit setting, which works for all properties and limits the number of values added for each entry (0 is the default, meaning no limit).

After confirming the preview (removing the old columns beruf and ort, cutting off the non-reconciled item using the facet on the left hand side), we have the enriched table with new data:

14

We can now use the new reconciled items (like Berlin in the Sterbeort column here) to add more columns based on their properties (i.e. properties of Berlin, not Weizenbaum, Joseph):

15

As an example, we add a link to a depiction of the Sterbeort:

16

Finally, we can export our data in various supported formats:

17

This concludes our overview of GND reconciliation in OpenRefine. For further information check out the OpenRefine general documentation and the reconciliation wiki page.

Comments? Feedback? Just add an annotation with hypothes.is.