Adrian Pohl / @acka47
Linked
Open Data, Hochschulbibliothekszentrum NRW (hbz)
SWIB17, Hamburg, 2017-12-05
This presentation:
http://slides.lobid.org/swib17-lightning-talk/
NWBib – a regional bibliography with 400k resources, web-version based on lobid-resources API
Besides an existing basic spatial classification, there are lots of strings referring to spatial subjects
~290k bibliographic resources with >300k occurrences of ~8,500 distinct "spatial strings"
"Bielefeld"
"Werther-Arrode"
"Kreis Olpe"
"Köln"
"Düsseldorf"
"Steele"
"Gronau "
"Lohmar"
"Kreis Olpe"
"Duisburg"
"Köln"
"Münster "
"Münster (Westf)"
"Düren"
"Wuppertal"
"Düsseldorf"
"Xanten"
"Ahlen "
"Grafschaft "
"Dortmund"
"Unterbruch, Heinsberg"
"Warendorf"
"Aachen"
"Sankt Hubert "
"Düsseldorf"
"Jülich"
"Bochum"
"Hagen"
"Jülich"
"Krefeld"
"Wuppertal-Sonnborn"
"Oberhausen-Sterkrade"
"Hagen"
"Bochum"
"Köln"
"Fröndenberg"
"Bad Honnef"
"Essen"
"Mülheim "
"Münster (Westf)"
"Bielefeld"
"Wesel "
"Duisburg"
"Kleve "
...
(see also)
the spatial strings are matched to stable and non-ambiguous concept URIs that are part of a hierarchical classification
people can discover bibliographic resources by browsing the spatial classification
Use German integrated authority file (GND) -> problems with multiple entries for one entry (e.g. pre- and post-incorporation by a larger administrative area) & missing hierarchy
Create and maintain a SKOS classification ourselves -> too labour-intensive
Use existing structured geo data & IDs
We not only get URIs, hierarchies & RDF descriptions
but also an infrastructure to maintain the data
along with some help from our friends (=Wikidata editors).
After a few adjustments we have pretty good results from the automatic matching
More than 99% of the resources with a string now also have a WD link with 92% being a pretty reliable match
The only problems we noticed are with districts that are named after a town they contain because the town itself is scored higher than the district
A good overview over the results:
https://test.nwbib.de/classification?t=Wikidata
A wiki page describing the matching process and results (in German)
The hierarchical classification (beta) created from Wikidata: https://test.nwbib.de/classification?t=Wikidata