Accessing ZDB via lobid-resources

04 Sep 2018, Adrian Pohl | 🏷 lobid-resources

A little bird told us that there is some interest in using lobid as interface to the data of the Zeitschriftendatenbank (ZDB) (Union Catalogue of Serials). ZDB is the central bibliographic and holdings database for periodicals in Germany and Austria. It records holdings of nearly all German academic libraries and of many public libraries. Responsible for the technical aspects are the Berlin State Library and the German National Library (DNB). ZDB data is licensed under CC0.

Although a new search ZDB interface for ZDB was recently launched, it does not offer a web API for easy access and use of the data by developers.

This is where lobid comes into play. lobid-resources provides a user interface and web API based on JSON-LD via HTTP for the union catalogue of the North Rhine-Westphalian Library Service Centre (hbz). The hbz union catalogue records approximately 20 Million cooperatively created bibliographic titles plus holding information from libraries in North Rhine-Westphalia and Rhineland-Palatinate. Besides that, the North Rhine-Westphalian Bibliography (NWBib) and – here comes the interesting part for this post – all periodicals from the German Union Catalogue of Serials (ZDB) are included.

Note: Although lobid-resources contains all title records from ZDB it does only contain the holdings information for hbz libraries, thus missing the holdings for libraries in other German federal states and Austria.

The lobid ZDB data

As said, the ZDB data is a subset of lobid-resources, generated from a MAB/XML export of the hbz union catalogue which is currently maintained with an Aleph system. For some background reading see this post about lobid APIs and the JSON-LD behind it and this post about the vocabularies we chose for representing the bibliographic data in RDF. The lobid-resources API documentation contains more details but is currently only available in German.

There also exist RDF representations of ZDB titles offered directly by the ZDB, compare for example the journal “Erkenntnis” at the ZDB (Turtle) and in lobid. RDF from the ZDB itself and from lobid differs in various aspects:

identifiers: URIs in lobid are built based on the hbz ID from the Aleph system while ZDB uses its specific IDs, look for example at the predecessor/successor resp. rdau:P60261/rdau:P60278 triples. lobid contains the ZDB ID for each resource, though, with the key zdbId.
modeling approaches: Alhough both ZDB and hbz/lobid learn from each other and often use the same approach and properties for expressing some information, they sometimes differ e.g. for representing publication date, place and publisher.
information covered: You can find some information in lobid that you won’t find in the ZDB RDF (e.g. record creation/modification information, and the hbz holdings) and the other way round (e.g. RDA content, media, carrier).

Anyway, if you are missing any information or notice bugs, we are happy to adjust, improve and add information in lobid as needed. Just let us know.

Accessing ZDB via lobid

Selecting the ZDB subset

Every ZDB entry in lobid is marked as part of the ZDB “collection” like so:

{
  "inCollection":[
    {
      "id":"http://lobid.org/resources/HT014846970#!",
      "label":"Zeitschriftendatenbank (ZDB)",
      "type":[
        "Collection"
      ]
    }
  ]
}

Accordingly, you can limit searches to the subset of ZDB resources by adding this to the query string:

inCollection.id:"http://lobid.org/resources/HT014846970#!"

Querying the subset

As you can see, ZDB comprises around 1.9 Million entries. To explore the dataset, you can add to the query above as you like. For a start, you can explore the data by using the facets from the HTML view to filter by medium, publication type, publication date or subject heading.

Using the Elasticsearch Query String Syntax you can compose quite complex queries against the data (see also our post on building complex queries, also in German). Some examples:

Using an _exists_ query to get all ZDB resources that have an ISSN: inCollection.id:"http://lobid.org/resources/HT014846970#!" AND _exists_:issn
All ZDB resources published by Elsevier: inCollection.id:"http://lobid.org/resources/HT014846970#!" AND publication.publishedBy:Elsevier
All ZDB resources that have a DDC notation: subject.source.id:"https://d-nb.info/gnd/4149423-4"
Query for a concrete DDC notation is a bit more complex as we have notations from different classifications in the subject array and one has to specify the source classification alongside the actual notation by using the nested query parameter. For example if I want all the resources with notation 100 from the DDC, I have to add the following to the query: &nested=subject:subject.notation:100+AND+subject.source.id:"https://d-nb.info/gnd/4149423-4"

Bulk downloads and post processing

You can use the API’s bulk options to download all the ZDB data or some subsets so you can run some scripts on them locally.

For example, download all ZDB resources with an ISSN and use jq to create a list of all ISSNs in ZDB:

$ curl -H "Accept: application/x-jsonlines" 'http://lobid.org/resources/search?q=inCollection.id%3A%22http%3A%2F%2Flobid.org%2Fresources%2FHT014846970%23%21%22+AND+_exists_%3Aissn' | jq -r .issn[] > zdb-issn.txt

(This will take some time, just took me nearly two minutes, you may speed it up by downloading gzipped data with Accept-Encoding: gzip.)

Let us know if you need help

This post does only cover examples and the other documentation is mostly in German. If you have further questions, contact us by annotating this post with hypothes.is, by mentioning @lobidOrg on Twitter or by writing an email.

Comments? Feedback? Just add an annotation with hypothes.is.