lobid 2.0

Building a LOD-based web API

Adrian Pohl / @acka47
Linked Open Data, Hochschulbibliothekszentrum NRW (hbz)


Show & Tell, World Wide Web, 2017-07-27

This presentation:
http://slides.lobid.org/lobid-show-and-tell/

Creative Commons License

Agenda

  • lobid
  • lobid-resources / Demo
  • API Documentation
  • Lessons Learned
  • Q & A

The hbz

North Rhine-Westphalian Library Service Centre, est. 1973

Software services for libraries in NRW and beyond

E.g. union catalog, discovery portal DigiBib, ILL, Digitization & Digital Preservation, consortial acquisition

See also hbz flyer in English (PDF)

lobid

What is lobid?

LOD-based data infrastructure

Research & Development since 2010

Search UIs for end users &
web APIs (read only) for web developers

Version 2.0 recently went into production

Based on data from different sources

Data sources

lobid-resources

The data: hbz union catalogue

Cataloging libraries: 56 academic & special libraries,
1000 institute/departmental libraries

20 million records and 45 million holdings

Cataloging environment: Aleph

Source format: Aleph MAB2 XML

ETL

Daily export of Aleph MAB2 XML based on Aleph publishing mechanism

Transformation to N-Triples with Metafacture

Conversion to JSON-LD with addition of some concept labels with Etikett

Result is indexed into Elasticsearch

Let's take a look

https://lobid.org/resources

Example resource: Aleph export, JSON-LD

API Documentation

See also "Documenting the lobid API" in the lobid blog

Part I: What to document?

Part II: How to document?

What to document?

Data set

API, including response format

RDF properties and classes

Provenance

Data set description

See also Data on the Web Best Practices

And the rest?

https://lobid.org/resources/api

Documenting API responses

Dull, without context:

I need examples!

But examples are often only an annex to the documentation if given at all

"Descriptive approach" is predominant

Putting the example into the center of documentation!

Why not attach structured data (name, description, URI etc.) directly to examples?

Today, this is no problem with annotation tools like hypothes.is

Attached information

Name

Description

Coverage

Use cases

Provenance

URI

Let's try this out

http://lobid.org/organisations/api/en#jsonld

Advantages

Contextualization of the documentation

Example is up-to-date, because live data is annotated

Feedback from API users via hypothes.is possible

Intuitive usage

Enables quicker and better understanding

Lessons Learned

...over the last couple years

APIs

SPARQL is nice for complex queries

For lots of use cases a performant API is more reliable and convenient

Also, you can implement LDF on top of that API

Data Modeling

Reuse of existing vocabularies might be overrated

Rather add one level of indirection than use subproperties (see also this post)

Creating nice JSON-LD isn't straightforward yet (especially with more complex data & bnodes) & there are different ways of doing it

Development

When something's missing, contribute to existing libraries instead of rolling your own

Sometimes it's better to build a new house instead of fixing an old one

Where can Alma improve over Aleph?

At best publish well-formed, consistent and correct JSON-LD

More realistically: provide reliable and performant interfaces for getting data and updates (e.g. dumps + ResourceSync)

Write API (Bibframe- or MARC-based)

Q & A

Further resources

lobid blog

lobid on Twitter

lobid-resources code on GitHub