Every Linked Data application builds upon vocabularies. But which ones contain the properties and classes needed for bibliographic descriptions? The topic of this blog post is how we choose specific vocabularies, properties and classes for lobid-resources, what patterns stand behind our choices and the reasons for them.
A grown application profile
In the very beginning of lobid (i.e. 2010) one specific bibliographic RDFS vocabulary or OWL ontology that catered to all our needs did not exist. This is even more true today as our Linked Data publication carries much more information than some years ago. Thus, we either needed to create our own ontology or create an application profile based on a number of different vocabularies. We opted for vocabulary reuse as this approach promised to increase interoperability with other linked data sets1. Only when we could not find existing properties or classes that fit our purpose and come from an existing vocabulary which looks serious and is still maintained, we would create new ones in lobid vocab. We still follow this approach, although we noticed that some effort is needed to keep up with changes in namespaces (looking at you, RDA!) or completely disappearing vocabularies. Having made these experiences, we might follow another approach if we had to start from scratch today: Creating an application-specific vocabulary as you go along and aligning it with existing vocabularies later definitely makes you focus on creating a sensible data model in the first place without taking over problematic models from others.2
For six years our used properties and classes developed rather organically and there was no well thought-out, documented strategy for chosing our properties and classes. This changed when we worked on the relaunch of the lobid API. We had to add and replace a lot of properties and finally assessed all properties and classes used in lobid in order to make our application profile as consistent as possible.
Finding & choosing RDF properties/classes
For adding labels and variant names of a resource, we chose the
skos:altLabel as RDFS and SKOS are two widely-used base vocabularies. Typing of linked entities from the Integrated Authority File (GND) is done using the GND Ontology.
Regarding the other elements, the workflow for finding the right thing to reuse goes as follows: We first look for fitting properties and classes mostly using Linked Open Vocabularies as search tool and identify which vocabularies provide things specific enough for our purposes. If multiple vocabs have fitting properties/classes we apply the following ranking to make our choice.
- DC Terms
- Bibframe 2.0
- Bibliograpic Ontology (Bibo)
- Resource Description and Access (RDA) Unconstrained Properties
- Several other vocabularies (MADS, Music Ontology, DC Elements,…) for individual elements
- Our own vocabulary
The ranking takes into account different aspects of vocabularies, like: How mature is the vocabulary? Is it well known and does it have a considerable user group? How stable is a vocabulary? Criteria for exclusion are whether vocabulary URIs actually resolve and deliver RDF.
To get into more detail: DC is at the first place, simply because it’s a widely adopted standard for basic information about resources.
Since Bibframe is still in development, changes will happen, making it rather unstable for now. However, we are optimistic that the current version 2.0 is stable enough and – on the plus side – we are able to propose changes and improvements as needed. As there is quite a lot interest in Bibframe we also just wanted to get a little bit familiar with it. It already turned out to be quite valuable, for example giving us the opportunity to replace some FRBR-relicts from our data and to model contributions and roles the way we needed, see this comment.
When the information to be expressed in RDF gets more and more library-specific the RDA Unconstrained Properties can often help out. We currently use eight RDA unconstrained properties, e.g. for things like thesis information, title of subseries or RDA specific information like nature of content.
Though we rely on Schema.org as base vocabulary in lobid-organisations, we sticked to DC, Bibo et al. as basic vocabularies in the context of bibliographic resources. We intend to add schema.org markup embedded in the HTML for use by search engines etc. But schema.org already convinced us to use its event-based modeling of publication information, see e.g. the “publication” object in this example file.
In specific cases we draw properties/classes from other sources, for example using MADS for representing complex subjects or Music Ontology for typing sheet music.
Finally, we create properties and classes in our own lobid vocab if other relevant vocabularies don’t resolve properly, aren’t available in RDF or if there is no existing vocabulary providing the necessary means at all. This was the case for 11 classes and 18 properties, e.g. when associating isPartOf relations of a resource to a series or multi-volume work with the volume number (see this issue) or when expressing dataset-specific information like the internal identifier. lobid-vocab can be found at http://purl.org/lobid/lv and is maintained on GitHub. For convenience it is written in Turtle.
Here is an overview over all vocabularies we currently use and how much classes/properties are taken from them:
|RDA Unconstrained Properties||8||-|
If you want to know which concrete properties and classes we use, take a look at our JSON-LD context3 or check out the documentation (in German).
If you have comments or suggestions for improvement, we would be interested to hear them.
1 This benefit hasn’t manifested itself yet, although we also have contributed to the Gruppe Titeldaten der DINI-AG KIM’s work on defining a common application profile for Linked Library Data in German-speaking countries.
2 Such an approach was taken developing the ls.ext library system for Deichman Library, Oslo. Rurik Greenall has promoted this strategy a lot, e.g. in his ELAG2015 talk.
3 To get a simple list run
curl http://lobid.org/resources/context.jsonld | grep '"@id" : "http'| sort -u.
Comments? Feedback? Just add an annotation with hypothes.is.