Here is a usecase of an automated version of Sparnatural…

European Parliament Open Data Portal : a SHACL-powered knowledge graph
A second usecase Thomas wrote for Veronika Heimsbakk’s SHACL for the Practitioner upcoming book is about Sparna’s work for the European Parliament.
From validation of the data in the knowledge graph to further projects of data integration and dissemination, many different usages of SHACL specifications were explored…
… and more exploratory usages of SHACL are foreseen !
“
A knowledge-graph powered open data portal
The European Parliament Open Data Portal (EPODP) went live in January 2023. Its particularity is that it is not a mere aggregation of documents or dump files from business applications in custom formats; but rather a collection of datasets each extracted from a central semantic knowledge graph, itself aggregating data migrated from approximately twenty business applications. The result is a semantically interoperable open data portal : the semantic of its data model is clearly defined and documented, and reuses widely deployed existing ontologies. It already provides its data to different consumers (most notably the europarl website and the EU law tracker) in a context of cross-institutions interoperability. The data captures the activity of the parliament : as co-legislator together with the Council of the EU, the European Parliament (EP) holds plenary sittings, in which reports originating from committees, as well as motion for resolutions, are amended and voted; after the vote, the final adopted texts are published.
The focus on semantic interoperability of EPODP maximizes the potential of reuse and linkage of its datasets, and maximizes the quality of the offered data. It comes however at a cost when building the portal : deep analysis and understanding of the existing data and documents structure is required to capture the business semantic. SHACL is the way to formally encode this business semantic – but how is it deployed in practice ? how is it maintained ? what are the different types of SHACL specifications used ?
SHACL at the center of a model-driven approach
SHACL in the EPODP is at the basis of multiple model-driven usages depicted in the following diagram:
There was two key drivers for introducing the use of SHACL in the EPODP project : validation of the data in the knowledge graph, and generation of public documentations of the models. The same SHACL specification that captures the business semantic is directly actionable to be published as a documentation and to validate the data. The produced documentation is a set of public files, such as the ELI-EP application profile documentation and others accessible from the EPODP developer’s corner. The SHACL Play documentation generator is used to produce the documentation pages. Data validation happens at earlier stages, after data transformation steps.
Two additional usages of SHACL specifications were explored : one was to generate SPARQL queries to extract the content of datasets from the larger knowledge graph. The SHACL specification of a dataset content is interpreted to generate SPARQL CONSTRUCT queries, executed against the entire knowledge graph, to return a subset of data corresponding to the specification. The query generation was implemented in SHACL Play, however the EPODP chose to continue using manually crafted SPARQL queries to generate the datasets. The other usage was to complement the SHACL specifications with the mapping rules used to feed the corresponding properties or classes in the graph. This has the advantage that the mapping rules are documented and maintained alongside the specification and not in a separate document. This work is ongoing.
More exploratory usages of SHACL are foreseen : generating a query user interface based on the SHACL specification, using the Sparnatural query builder, and also input forms to facilitate the creation of DCAT datasets descriptions. Additionally, automated generation of the JSON-LD context and the JSON schema of the API are foreseen.
Not « 1 SHACL to rule them all », but application profiles, dataset definitions, and migration specifications
The definition of the EPODP knowledge graph is not captured in a single SHACL specification, but rather in three different application profiles, each being a selection of classes and properties of one sub-domain : ELI-EP covers the description of documents and activities, ORG-EP covers the definitions of EP organisations (such as committees, political groups, etc.) and members of the parliament, and SKOS-EP covers how controlled vocabularies are structured. In addition, DCAT-EP is the specification for how dataset records are described in the EPODP catalog – but this is not part of the knowledge graph per se.
Together, ELI-EP, ORG-EP and SKOS-EP specify the structure of the entire knowledge graph from which the datasets are extracted. In addition, the structure of each dataset family available in the EPODP (such as adopted texts, plenary documents, parliamentary questions, etc.) is also described in SHACL, referred to as « DSD » for « Dataset Definition ». While the application profiles describe every possible properties on generic shapes, the DSDs will specify only the subset of properties used in a dataset, with possibly different cardinalities or range. For example, ELI-EP specifies that « a Work may have the property eli:adopts« (with no minimum cardinality (eli:adopts is defined as « Indicates that the work represents the adopted work of one or several related works »). The DSD for adopted texts datasets specifies the shape of « Adopted texts » as a subset of the Works, and indicates that the minimum cardinality of eli:adopts is 1 for this particular subset. Besides, some properties, such as eli:amends are not available for adopted texts, thus not declared in the DSD.
In addition, specifications of the conversion of some data sources are also specified in independent SHACL files. The articulations of these 3 kinds of SHACL files and the reused ontologies is depicted in the following diagram:
There is currently no reuse or reference of shapes across the different specifications. Each is independent. A nice improvement would be to study how SHACL DSDs could be derived from the application profile SHACL, without redeclaring the identical constraints.
Editing SHACL in spreadsheets
In total 16 SHACL specifications are currently published in the EPODP, and more are used to validate data migrated from each individual sources. The first step in the specification of each model is the design in a diagram such as the ones visible in the public documentations of the models. The EPODP team is then using spreadsheets to encode the specifications, adapted from the one provided in the SHACL Play suite. The spreadsheet is converted to SHACL using the xls2rdf converter. Spreadsheets provide a simple editing solution, with an easy learning curve, made even easier with a few formulas to compute cell values automatically. It even provides ways for editing advanced patterns (such as the ability to directly turtle lists for sh:or, or blank nodes for property paths), but of course still limits the expressivity. The following screenshot shows how property shapes look like in the spreadsheet:
Results and future perspectives
The EPODP use-case shows how SHACL can be applied in a systematic way in a data integration and dissemination project : at the data transformation step, at the knowledge graph level, and at the data dissemination. Public documentation, data validation, data extraction are tasks that can be be automated based on a SHACL specification. While the context is one of a large public institution, the same approach can be applied in industrial contexts. The SHACL specifications are a cornerstone of such projects, enabling semantic interoperability at large and a mutual understanding between business experts, data analysts, developers, and data consumers.
”
Veronika’s book will be divided into three parts :
1. Back to Basics
Introduction to logic and RDF, brief skimming of the topics. Also covering various world assumptions.
2. Getting to know the stuff
Introduction to SHACL, including core, sh-sparql, advanced features.
3. Working with the stuff
SHACL Stories. Use cases, user stories and implementations.
Image : © European Union, [2024] – EP