This presentation:
http://slides.lobid.org/nwbib-wikidatacon/
NWBib = "North-Rhine Westphalian Bibliography"
A regional bibliography (Q1609504) that records literature about North-Rhine Westphalia (NRW), its regions, places and people
besides monographs, especially articles are recorded as well as maps, DVDs etc.
420.000 entries from publication year 1983 onwards
Web application: nwbib.de (based on lobid-resources API)
Cataloging in hbz union catalog (Aleph)
8,800 distinct strings for places (mostly administrative areas, but also monasteries, principalities, natural regions etc.)
Strings naively matched to Wikidata entries for geo coordinates (since 2014)
Drawbacks: non-unique names, no hierarchy, poor WD matching
3 "Brauweiler"
1 "Brauweiler, Puhlheim"
132 "Brauweiler <Pulheim>"
5 "Brauweiler, Pulheim"
4 "Breitenbruch <Arnsberg>"
1 "Brelen"
29 "Bremen <Ense>"
1 "Bremke <Menden, Sauerland>"
1 "Brempt"
1 "Bremscheid"
4 "Bremscheid <Eslohe, Sauerland>"
3 "Brenig"
2 "Brenken <Büren, Kreis Paderborn>"
21 "Brenken <Büren, Paderborn>"
See full list at https://gist.github.com/acka47/ccd3715201442e8cb78c70cca9ebd1ab
Hierarchical view of places based on administrative areas
One entry/ID for each place
We looked at Integrated Authority File (GND), GeoNames and Wikidata
We chose Wikidata as data base mainly because of its coverage, its existing infrastructure
Good coverage of place entries, geo coordinates and hierarchical information
Infrastructure for editing and versioning was already there plus a community we could participate in to keep the data up to date
NWBib editors did not want NWBib to directly rely on Wikidata
...because Wikidata servers are not under our control
Also fear of unwelcome edits or vandalism
We decided to manage an intermediate SKOS (Simple Knowledge Organization System) file the application would rely on
Matching via API isn't sufficient (e.g. because different levels of administrative areas have very similar names)
Matching via custom Elasticsearch index (maximizing precision by type restriction)
The index is built based on Wikidata SPARQL query for specific entities in NRW
(see query evolution I and II)
{
"spatial":{
"id":"http://www.wikidata.org/entity/Q897382",
"label":"Ehrenfeld",
"geo":{
"lat":50.9464,
"lon":6.91833
},
"type":[
"http://www.wikidata.org/entity/Q15632166"
]
},
"aliases":[
{
"language":"de",
"value":"Köln/Ehrenfeld"
},
{
"language":"de",
"value":"Köln-Ehrenfeld"
}
],
"locatedIn":{
"language":"de",
"value":"Köln-Ehrenfeld"
}
}
Successful automatic matching for >99% of records and ~92% of the place strings
For <1000 strings the matching was incorrect
Catalogers adjusted catalog records and made >6000 manual edits to Wikidata to reach 100% coverage (adding aliases & type information, creating new entries)
Based on Wikidata entries and hierarchical statements (mainly P131)
Add SKOS concept URIs and links to Wikidata to lobid/NWBib (see "spatial" object in JSON example)
{
"spatial":[
{
"focus":{
"id":"http://www.wikidata.org/entity/Q365",
"geo":{
"lat":50.942222222222,
"lon":6.9577777777778
},
"type":[
"http://www.wikidata.org/entity/Q22865",
"http://www.wikidata.org/entity/Q707813",
"http://www.wikidata.org/entity/Q200250",
"http://www.wikidata.org/entity/Q2202509",
"http://www.wikidata.org/entity/Q42744322",
"http://www.wikidata.org/entity/Q1549591"
]
},
"id":"https://nwbib.de/spatial#Q365",
"type":[
"Concept"
],
"label":"Köln",
"notation":"05315000",
"source":{
"id":"https://nwbib.de/spatial",
"label":"Raumsystematik der Nordrhein-Westfälischen Bibliographie"
}
}
]
}
Example entry
spatial.focus.id:"http://www.wikidata.org/entity/Q365"
Create Wikidata property for NWBib ID
Batch load NWBib IDs with QuickStatements (issue)
Add broader information to NWBib ID statements with qualifier P4900 (issue)
Classification is created based on base SKOS conf file and Wikidata: spatial classification (SKOS version)
New classification entries are included by adding a P6814 (NWBib ID) statement in Wikidata
Hierarchy is adjusted using the P4900 qualifier
Review process for new versions of the classification to be specified
Köln$$0https://nwbib.de/spatial#Q365
By the end of 2019 catalogers will start using Wikidata-based URIs for spatial subject indexing in Aleph
Existing NWBib entries will be updated in Aleph