Our lobid-gnd service provides access to the Integrated Authority File GND. The service contains integration into OpenRefine, a powerful tool for working with messy data. This tutorial provides an overview of GND reconciliation for OpenRefine. The features used here require OpenRefine 2.8 or later.
Reconciliation is the process of matching name strings to identifiers of entities in a database like an authority file, Wikidata etc. This is useful whenever you want to merge differing name strings for the same person in your data or when you want to fetch additional data from the target database you are reconciling against.
The first step in the reconciliation process is to create a project. OpenRefine can import data from various sources. For this tutorial, we’ll simply import data from the clipboard:
Copy these lines and paste them in OpenRefine:
name;beruf;ort J. Weizenbaum;Informatiker;Berlin Twain, Mark;Schriftsteller; Kumar, Lalit;; Jemand;;
In the following preview screen you can take over the settings which were automatically detected and create the project:
We now want to reconcile the text strings in the
name column with GND entries:
We’ll have to add the GND reconciliation service:
https://lobid.org/gnd/reconcile as the service URL:
Collapse the drawer on the left hand side by clicking the newly added service. As our list for reconciliation consists solely of personal names, we now select
Person to reconcile only against GND entries of type
For real-world data it can make sense to pass additional data from other columns to improve the reconciliation results (the value in the text box is arbitrary here, but must not be empty):
After reconciliation, we can inspect not automatically matched candidates by clicking their name:
This brings up a preview, with the option to match them:
After matching, we can enrich our data with the reconciled data. We want to add columns based on the reconciled values:
We can now select the properties we want to add and preview them. Here, we choose
Beruf oder Beschäftigung,
The first three properties are GND entries themselves, so they are recognized as reconciled items (they are links in the preview).
For non-reconciled items that have a label and an ID in lobid-gnd (such as
Ländercode), we can configure the content we want (label or ID) using the
configure link for that property:
Note also the
limit setting, which works for all properties and limits the number of values added for each entry (0 is the default, meaning no limit).
After confirming the preview (removing the old columns
ort, cutting off the non-reconciled item using the facet on the left hand side), we have the enriched table with new data:
We can now use the new reconciled items (like
Berlin in the
Sterbeort column here) to add more columns based on their properties (i.e. properties of
As an example, we add a link to a depiction of the
Finally, we can export our data in various supported formats:
Comments? Feedback? Just add an annotation with hypothes.is.