Sparna Blog » 03-Technologie

Sparnatural as a simple data federation facade

Thomas Francart — 2025-06-03T10:30:27Z

Together with the FAIR-data-evangelists of the MSH Val-de-Loire, we rencently worked on the v2 of the OpenArchaeo portal that uses Sparnatural as its core visual data exploration component (the v2 is not yet visible, but hopefully will be finalized and announced soon !)

It uses a new feature of Sparnatural : the ability to act as a single UI facade to multiple SPARQL endpoints. The user visually writes a single query, the query is sent to multiple data sources, and results are aggregated to be presented to the user in a single result set, that includes each result provenance.

This is depicted in the diagram below:

The user writes his query visually in Sparnatural, here « All archaelogical sites where burials have been found, with the name of their discoverer, if known » :

The visual query is translated into SPARQL, and that SPARQL query is sent to each data source in the federation (actually, the user can select the ones he wants to query). This is depicted by the green arrows in the diagram, numbered « 1 ».

The SPARQL query looks like the following; note how it uses complex CIDOC-CRM property paths, such as the highlighted one, while the visual user query was simple:

Each SPARQL service returns a result. This is depicted by the orange arrows in the diagram, numbered « 2 ».

When every endpoint have answered, their results are aggregated into a single result set. This is number « 3 » in the diagram. During this aggregation, an extra column is added in the result set, containing the name of the source from which the result was retrieved.

The user sees the aggregated result; here, the name of the site, the name of its discovered when known, and the source in which the result was found:

This is possible thanks to the catalog configuration of Sparnatural.

Sparnatural can be passed a catalog of SPARQL endpoints in a federation, and in this case, it will send the same SPARQL query to each, and will aggregate the results. This happens for the final query of course, but also during selection of values in the query UI.

There are two main limits of this approach:

Limit 1 : all sources in the federation must share the same data model, as the same query is sent to every source
Limit 2 : each source must be independant : there should be no links from one source to another source so that the query can be solved by each endpoint independantly (so actually, no truly distributed linked data)

Those are the reasons I have entitled the post « simple federation facade ». Those 2 hypothesis are met in OpenArchaeo, and they were also met in the case of the (never released) prototype of the Europeana Linked Data taskforce. If you know other cases of data federation in which this is also true, tell us ! (we could actually try the same on a few DBPedia endpoints using the dbo ontology as a pivot model)

Now guess what ? in Sparnatural we have a « query UI to SPARQL » transformation step, thanks to the SHACL configuration of Sparnatural. Basically we can map a UI property on an underlying property path. Then it would not be too difficult to do this mapping on a source-by-source basis, to have different queries sent to each source, from a single query in the UI. The result set structure would be the same, and result set aggregation can still happen. We would then overcome the first limit described above. That’s the next step !

Cet article Sparnatural as a simple data federation facade est apparu en premier sur Sparna Blog.

European Parliament Open Data Portal : a SHACL-powered knowledge graph

Marie Muller — 2025-04-09T14:10:12Z

A second usecase Thomas wrote for Veronika Heimsbakk’s SHACL for the Practitioner upcoming book is about Sparna’s work for the European Parliament.

From validation of the data in the knowledge graph to further projects of data integration and dissemination, many different usages of SHACL specifications were explored…

… and more exploratory usages of SHACL are foreseen !

“

A knowledge-graph powered open data portal

The European Parliament Open Data Portal (EPODP) went live in January 2023. Its particularity is that it is not a mere aggregation of documents or dump files from business applications in custom formats; but rather a collection of datasets each extracted from a central semantic knowledge graph, itself aggregating data migrated from approximately twenty business applications. The result is a semantically interoperable open data portal : the semantic of its data model is clearly defined and documented, and reuses widely deployed existing ontologies. It already provides its data to different consumers (most notably the europarl website and the EU law tracker) in a context of cross-institutions interoperability. The data captures the activity of the parliament : as co-legislator together with the Council of the EU, the European Parliament (EP) holds plenary sittings, in which reports originating from committees, as well as motion for resolutions, are amended and voted; after the vote, the final adopted texts are published.

The focus on semantic interoperability of EPODP maximizes the potential of reuse and linkage of its datasets, and maximizes the quality of the offered data. It comes however at a cost when building the portal : deep analysis and understanding of the existing data and documents structure is required to capture the business semantic. SHACL is the way to formally encode this business semantic – but how is it deployed in practice ? how is it maintained ? what are the different types of SHACL specifications used ?

SHACL at the center of a model-driven approach

SHACL in the EPODP is at the basis of multiple model-driven usages depicted in the following diagram:

There was two key drivers for introducing the use of SHACL in the EPODP project : validation of the data in the knowledge graph, and generation of public documentations of the models. The same SHACL specification that captures the business semantic is directly actionable to be published as a documentation and to validate the data. The produced documentation is a set of public files, such as the ELI-EP application profile documentation and others accessible from the EPODP developer’s corner. The SHACL Play documentation generator is used to produce the documentation pages. Data validation happens at earlier stages, after data transformation steps.

Two additional usages of SHACL specifications were explored : one was to generate SPARQL queries to extract the content of datasets from the larger knowledge graph. The SHACL specification of a dataset content is interpreted to generate SPARQL CONSTRUCT queries, executed against the entire knowledge graph, to return a subset of data corresponding to the specification. The query generation was implemented in SHACL Play, however the EPODP chose to continue using manually crafted SPARQL queries to generate the datasets. The other usage was to complement the SHACL specifications with the mapping rules used to feed the corresponding properties or classes in the graph. This has the advantage that the mapping rules are documented and maintained alongside the specification and not in a separate document. This work is ongoing.

More exploratory usages of SHACL are foreseen : generating a query user interface based on the SHACL specification, using the Sparnatural query builder, and also input forms to facilitate the creation of DCAT datasets descriptions. Additionally, automated generation of the JSON-LD context and the JSON schema of the API are foreseen.

Not « 1 SHACL to rule them all », but application profiles, dataset definitions, and migration specifications

The definition of the EPODP knowledge graph is not captured in a single SHACL specification, but rather in three different application profiles, each being a selection of classes and properties of one sub-domain : ELI-EP covers the description of documents and activities, ORG-EP covers the definitions of EP organisations (such as committees, political groups, etc.) and members of the parliament, and SKOS-EP covers how controlled vocabularies are structured. In addition, DCAT-EP is the specification for how dataset records are described in the EPODP catalog – but this is not part of the knowledge graph per se.

Together, ELI-EP, ORG-EP and SKOS-EP specify the structure of the entire knowledge graph from which the datasets are extracted. In addition, the structure of each dataset family available in the EPODP (such as adopted texts, plenary documents, parliamentary questions, etc.) is also described in SHACL, referred to as « DSD » for « Dataset Definition ». While the application profiles describe every possible properties on generic shapes, the DSDs will specify only the subset of properties used in a dataset, with possibly different cardinalities or range. For example, ELI-EP specifies that « a Work may have the property eli:adopts« (with no minimum cardinality (eli:adopts is defined as « Indicates that the work represents the adopted work of one or several related works »). The DSD for adopted texts datasets specifies the shape of « Adopted texts » as a subset of the Works, and indicates that the minimum cardinality of eli:adopts is 1 for this particular subset. Besides, some properties, such as eli:amends are not available for adopted texts, thus not declared in the DSD.

In addition, specifications of the conversion of some data sources are also specified in independent SHACL files. The articulations of these 3 kinds of SHACL files and the reused ontologies is depicted in the following diagram:

There is currently no reuse or reference of shapes across the different specifications. Each is independent. A nice improvement would be to study how SHACL DSDs could be derived from the application profile SHACL, without redeclaring the identical constraints.

Editing SHACL in spreadsheets

In total 16 SHACL specifications are currently published in the EPODP, and around 80 are used to validate data migrated from each individual sources. The first step in the specification of each model is the design in a diagram such as the ones visible in the public documentations of the models. The EPODP team is then using spreadsheets to encode the specifications, adapted from the one provided in the SHACL Play suite. The spreadsheet is converted to SHACL using the xls2rdf converter. Spreadsheets provide a simple editing solution, with an easy learning curve, made even easier with a few formulas to compute cell values automatically. It even provides ways for editing advanced patterns (such as the ability to directly turtle lists for sh:or, or blank nodes for property paths), but of course still limits the expressivity. The following screenshot shows how property shapes look like in the spreadsheet:

Results and future perspectives

The EPODP use-case shows how SHACL can be applied in a systematic way in a data integration and dissemination project : at the data transformation step, at the knowledge graph level, and at the data dissemination. Public documentation, data validation, data extraction are tasks that can be be automated based on a SHACL specification. While the context is one of a large public institution, the same approach can be applied in industrial contexts. The SHACL specifications are a cornerstone of such projects, enabling semantic interoperability at large and a mutual understanding between business experts, data analysts, developers, and data consumers.

”

Veronika’s book will be divided into three parts :

1. Back to Basics
Introduction to logic and RDF, brief skimming of the topics. Also covering various world assumptions.

2. Getting to know the stuff
Introduction to SHACL, including core, sh-sparql, advanced features.

3. Working with the stuff
SHACL Stories. Use cases, user stories and implementations.

Image : © European Union, [2024] – EP

Cet article European Parliament Open Data Portal : a SHACL-powered knowledge graph est apparu en premier sur Sparna Blog.

The Genesis of Sparnatural in the context of the OpenArchaeo platform

Marie Muller — 2025-03-28T14:54:53Z

The OpenArchaeo platform, developed by French consortium Huma-Num MASAplus (Mémoire des Archéologues et des Sites Archéologiques) together with SPARNA, is a platform dedicated to archaeological data interoperability. This semantic interoperability objective relies on the strong conceptual foundations offered by the CIDOC-CRM data model.

Paired with the CIDOC-CRM in a federated way, OpenArchaeo aims at :

making available the archaeological datasets produced by the MASAplus consortium’s partners on the semantic web, in the form of a triplestore with data aligned with the ontology and its extensions dedicated to archaeology ;
providing an intuitive query interface for archaeological data.

The latter query interface integrates the Sparnatural knowledge graph exploration component. The UI of this component was heavily inspired by the British Museum’s ResearchSpace semantic search feature, as the system proposes the user to build his own queries based on the CIDOC-CRM model underlying the data.

About ResearchSpace platform

Initiated in 2009 by a cross-disciplinary team at the British Museum, ResearchSpace is « A full CIDOC-CRM authoring and search system, based on an exhaustive collection of forms that reflects all applicable relationships from the CIDOC CRM ontology. »

Among a wide range of semantic tools to create, manipulate, analyse and visualise data, the platform provides a semantic structured search component based on categories and relations.

While open source, ResearchSpace’s code didn’t fit our architecture : we just chose to follow the simple visual elements of ResearchSpace’s query interface to develop our own Sparnatural query builder for OpenArchaeo, and set up a system of icons to identify the main components of the archaeological data.

ResearchSpace has recently (december 2024) released a brand new 4.0.0 version. This latest can be installed easily and now comes with a default setup of forms based on the CIDOC-CRM. It enables image annotations, knowledge maps creations, semantic narratives writing, timeline productions, and more semantic tools.

Sparnatural’s first use-case was OpenArchaeo’s CIDOC-CRM model !

The structure of the knowledge graph of OpenArchaeo relies on the CIDOC-CRM and some of its extension (CRMarchaeo, CRMsci and CRMba). It is a generic model that covers the basic concepts found in most archaeological corpuses (site, operation, structure, feature, wall, burial, stratigraphic unit and artifact).

Here a focus on Class S19 :

Several external thesauri were added too for querying the datasets : PACTOLS thesaurus for archaelogy, but also Geonames and Periodo for spatial and temporal searches.

This way, when users wish to connect two elements (artifact and site for example), the interface automatically suggests the available relationships between these entities, enabling users to formulate their request in a simple way without having to know either the entities and properties of CIDOC CRM, or the structure of the system : the SPARQL queries that correspond to the sentences visually built by users will be automatically computed. In addition, the usage of thesauri allows the users to cross-reference easily multiple datasets through the different widgets proposed in Sparnatural.

Get the latest release of Sparnatural !

Since it was created for OpenArchaeo in 2019, Sparnatural UI has been fully redesigned. It now offers a large panel of features, from different widgets for value selection (dropdown lists, ordered by occurrence count or alphabetically, autocomplete search fields, date pickers, tree widgets…) to brand new result display plugins : the default visualisation is a table of results, but if the results are geolocalized they can be shown in a map. Also grid, stats, pie or bar charts, and a timeline plugin have been made available and documented.

To go further on OpenArchaeo’s platform …

See a presentation of the project on the CIDOC Museum Documentation Channel
(« Semantic modelling of archaeological data online workshop series »)

The platform : http://openarchaeo.huma-num.fr/

The project : https://masa.hypotheses.org/openarchaeo

Read full research paper about the project : https://ceur-ws.org/Vol-2375/paper1.pdf

Image : Vestiges of a large villa in Courbehaye « les Deux Muids / le Moulin de Mongé », photo Alain Lelong (2003), Atlas des Établissements Ruraux de Beauce Antique, licence CC BY-NC-SA

Cet article The Genesis of Sparnatural in the context of the OpenArchaeo platform est apparu en premier sur Sparna Blog.

Retour sur … Le déploiement de Sparnatural pour FranceArchives

Marie Muller — 2025-02-14T17:35:50Z

Voilà maintenant près d’un an et demi que FranceArchives, le Portail national des Archives de France, a annoncé le déploiement de l’outil « Supernatural » (comprendre Sparnatural) via ses réseaux, dans l’optique de proposer à ses usagers « un accès nouveau aux métadonnées archivistiques, complémentaire de la recherche classique par le moteur du portail ».

Porté par le Service interministériel des Archives de France, le portail FranceArchives offre une recherche fédérée dans près de 26 millions de métadonnées archivistiques produites par près de 170 institutions et entièrement sémantisées en RDF par le biais de l’ontologie RiC-O version 0.2 publiée en février 2021.

C’est une des premières utilisations de RiC-O à grande échelle (même s’il faudra à l’avenir qu’il se mette à jour sur la version 1.0 de RiC-O publiée depuis !), et c’est également l’un des premiers entrepôts de données archivistiques de cette taille sur le Linked Open Data.

… Un graphe de données qui a tout pour être « Supernaturalisé »

Des données de qualité à une recherche augmentée

… Enfin, il va surtout s’agir de ses « données de qualité », autrement dit les :

inventaires avec leurs composants,
notices descriptives de producteurs d’archives,
fiches signalétiques des services d’archives,

… tous objets liés à une autorité « personnes et institutions », « lieux » et « thèmes » de qualité (soit moins de 5% des métadonnées du portail avant conversion… et plus de 70% de l’ensemble du réservoir en RDF !), autorités de qualité elles-mêmes harmonisées et alignées vers des référentiels nationaux et internationaux.

Modèle particulièrement adapté à la description des archives en RDF, c’est l’ontologie RiC-O (v0.2) qui a été utilisée pour la sémantisation des données XML EAD – XML EAC-CPF vers RDF, complétée de schema.org pour les fiches signalétiques des services de l’annuaire au format XML EAG.

Les informations relatives aux archives et à leurs producteurs étant décrites dans des fichiers différents, la recherche avancée via SPARQL rend désormais possible une interrogation fédérée plus fine d’un vaste corpus de notices en « traversant » le graphe structuré selon le modèle RiC-O. En effet, l’intérêt de l’interrogation via SPARQL est de casser les silos entre types de métadonnées : il permet de faire une recherche transversale entre données provenant de fichiers EAD et de fichier EAC-CPF.

Les notices affichées en résultats de recherche montrent les alignements existants vers les notices de producteurs externes, Wikidata, data.bnf, GeoNames ou encore le Thesaurus pour l’indexation matières des archives locales. C’est ainsi dans l’onglet Personnes/indexations liées que sont exploités les résultats de la conversion en RDF, par le biais de suggestions de recherches complémentaires sur le portail classique.

Une façon de faire bénéficier le grand public du RDF de manière complètement transparente pour lui !

Quelques exemples de requêtes…

On accède à l’outil via le menu « Recherche SPARQL » en haut à droite du site du portail :

Plusieurs exemples de requêtes sont à disposition pour explorer les données :

De la requête la plus simple :

Personne est membre de Institution

À des requêtes de plus en plus élaborées et complexes, comme ici :

Lieux qui sont le sujet des archives reliées au fonds « Fabrique de berlingot Eysséric »

Où l’on voit que l’on peut retracer le cheminement de la requête à travers le graphe de l’ontologie RiC-O en cliquant sur « Afficher/masquer l’éditeur SPARQL ».

Des archives à la page…

À noter que le projet, qui avait fait l’objet d’une présentation à l’occasion de SWIB (Semantic Web in Libraries) et de SemWebPro 2023 a été entièrement déployé (et configuré !) à partir de la documentation disponible sur le site web de Sparnatural.

N’hésitez pas à aller la consulter !

Hello Sparnatural

How-to configure in SHACL

Reference documentation of Sparnatural widgets

Pour aller plus loin sur la sémantisation des archives…

Le déploiement de Sparnatural sur FranceArchives fait suite à une autre réalisation de l’année précédente, le démonstrateur Sparnatural des Archives nationales. Celui-ci avait permis de faire évoluer Sparnatural et de le déployer sur un graphe sémantique en RiC-O de 20 millions de triplets (hors inférence), alimenté avec le contenu de 1577 instruments de recherche décrivant les archives de 40 des 122 études notariales de Paris conservées aux Archives nationales, de 1120 notices décrivant ces études et les notaires qui y ont exercé, et d’autres référentiels des Archives nationales notamment sur les lieux de Paris. La réalisation de ce démonstrateur a été entièrement documentée en français et en anglais. Ce démonstrateur et ses interfaces évolueront d’ailleurs bientôt.

Depuis, Sparna s’est impliqué dans le domaine de la sémantisation des archives puisque nous développons également, pour les comptes des Archives Nationales, l’outil Ric-O converter.

Celui-ci permet la conversion de notices EAD et EAC vers du RDF exprimé en RiC-O. Nous finalisons actuellement une nouvelle version du convertisseur pour le rendre compatible RiC-O 1.0 (et même 1.1 dont la sortie est imminente).

Un nouvel article à paraître ici sur RiC-O ? … Stay tuned !

Cet article Retour sur … Le déploiement de Sparnatural pour FranceArchives est apparu en premier sur Sparna Blog.

Nakala : from an RDF dataset to a query UI in minutes – SHACL automated generation and Sparnatural

Marie Muller — 2025-02-06T10:38:25Z

Here is a usecase of an automated version of Sparnatural submitted as an example for Veronika Heimsbakk’s SHACL for the Practitioner upcoming book about the Shapes Constraint Language (SHACL).

“

The Sparnatural knowledge graph explorer leverages SHACL specifications to drive a user interface (UI) that allows end users to easily discover the content of an RDF graph. What is the best way to make this UI-oriented SHACL specification ? if a SHACL specification for the knowledge graph structure already exists, can it be used directly ? does it require customization ? or is the Sparnatural SHACL spec completely decoupled from an existing knowledge graph spec ? and what if no SHACL spec exists at all ?

We faced all these different situations while deploying Sparnatural, and used various approaches to produce a satisfying end-user oriented specification. In particular, the Nakala repository is one of the latest graph for which Sparnatural was deployed. Nakala is a data repository that aims to preserve and disseminate data produced by French research projects in the Humanities and Social Sciences, in compliance with the FAIR principles. Nakala is a service offered by Huma-Num, a research infrastructure dedicated to the digital humanities. The Nakala knowledge graph contains `dcterms` metadata provided by researchers to describe the resources they upload. Additional non-dcterms metadata can also be provided. The metadata varies in quality and quantity depending on the researcher. When exposed in a SPARQL endpoint, resources, collections of resources and agents are described using the Europeana Data Model (EDM).

As the EDM dissemination channel for Nakala was new, no SHACL specification existed for it. We could have designed one for Sparnatural from scratch, but the choice was make to generate it automatically, with no human intervention. This was for three reasons : ease of configuration, flexibility in maintenance over time, and pedagogical reason, as it was important to explain the structure of the graph to target users.

Sparnatural UI

Let’s first have a look at what the Sparnatural UI looks like on an example from Nakala:

Once you know that « ProvidedCHO » stands for « Provided Cultural Heritage Object », and that « asWKT » encodes the location of a Place, you will be able to understand that the query searches for all ProvidedCHO entries gathered into a certain collection (« Cartes Université Bordeaux Montaigne » – a collection of maps), and selects their location and an optional description (and yes, the results of this query are displayed on a map, but that’s out of scope).

SHACL is derived automatically

In this project we wanted the shortest path from the graph to the query UI. Hence we used a SHACL generation algorithm, available in SHACL Play. By issuing SPARQL queries on an RDF graph, the algorithm determines the NodeShapes (targeting the classes used as values of `rdf:type`), and PropertyShapes (from all predicates used on instances on each class) of the model, with their node kinds, datatypes, class range, and cardinalities. It generates `sh:or` constraints when multiple datatypes or ranges are found. Note that in the case of Nakala a large variety of ranges are used, since the data comes from very open user inputs : the same `dcterms` property can be either an IRI or a Literal, with varying datatypes.

In addition, the algorithm computes some statistics on the dataset : the number of targets of each NodeShapes, the number of occurrences and the number of distinct values for each property shapes. The statistics are expressed using the `void` vocabulary, and `dcterms:conformsTo` is used to link void partitions to the corresponding shapes.

The SHACL Play documentation tool was then used to generate a report of the generated SHACL combined with the statistics. A few errors were spotted in the exported data, and fixed. We also saw that around 70 properties were present only a few times out of 700.000+ ProvidedCHO records. These properties were applied by probably a single or very few researchers when describing their data. It was decided to filter them out to keep the final UI simple, with an extra filtering step : based on statistics, property shapes used less than 0.1% of the number of targets of their node shapes are removed.

Here is a screenshot of the report : the right column shows the number of distinct values, and the column before is the number of total occurrences; we can immediately see that `dct:isReplacedBy` occurs only once, and `dct:isRequiredBy` occurs 81 times. They will be filtered out.

Sparnatural reads SHACL

Sparnatural can then read the SPARQL specification, together with the dataset statistics. When designing a query, value selection widgets for literal properties are determined by looking at the `sh:datatype` constraint (for number, dates, boolean, or map widgets). For IRI properties, statistics are used to distinguish between list and autocomplete widgets. Predicates with less than 500 distinct values will use a dropdown list, and those with more will use an autocomplete search field. The range is determined by reading `sh:class` or `sh:node`. The label to show in dropdown lists or to search on autocomplete field is determined by looking at a `dash:propertyRole = dash:LabelRole` annotation.

How about labels ? Sparnatural can read them from classes and properties of the original OWL file, if provided with it. Otherwise local names of target classes or predicates are used.

Other configuration techniques

Other Sparnatural deployments, such as the DBPedia demo are designed in SHACL from scratch, using spreadsheets. This requires more manual work, but has the advantage of tailoring the UI to exactly what needs to be shown, including user-oriented labels/tooltips/icons, hiding some properties, taking shortcuts or declaring inverses using property paths, etc. In the case of DBPedia, no SHACL spec exists, and deriving it automatically for the entire graph would probably not make a lot of sense, hence the necessity for a manual design.

For other projects we are working on a third configuration technique : a SHACL spec that describes the exact content of the graph is first built. It is used to publish the documentation of the model and to validate the data. A separate shapes file containing a Sparnatural-specific configuration layer is then added on top of it. That layer can hide shapes by applying an `sh:deactivated` annotation on them, can specify the UI widgets to use, add additional `dash:LabelRole` flags, add shortcut or inverse properties, etc.

The 3 configuration paths are shown in the following diagram:

Your query UI in minutes

We combined 4 tools (all open-source) : an algorithm to generate a « profile » in SHACL of an RDF dataset, a statistical report generator, a SHACL filter based on statistics, and the Sparnatural query UI. The ability to generate the SHACL profile and review it in the report provided a way to understand the structure of the data in a matter of minutes, while hours would have been necessary with SPARQL queries, without a guarantee of completeness. The provision of the query UI was made by dropping the SHACL file and the statistics to Sparnatural, without manual intervention. This shows the pivotal role of SHACL for data quality and model-driven approaches for knowledge graphs projects.

”

We look forward to reading Veronika’s book, and you ?

Cet article Nakala : from an RDF dataset to a query UI in minutes – SHACL automated generation and Sparnatural est apparu en premier sur Sparna Blog.

Sparnatural SHACL configuration : manual, automated, off-the-shelf

Thomas Francart — 2025-01-21T07:00:35Z

Sparnatural is a knowledge graph visual browser made for end-users. The user is guided in the creation of her graph traversal query by selecting the kind of entities she is searching, how these entities are connected to other entities in the graph, and which properties she would like to have in her result columns.

The possible entities, connections and properties that are shown to the user need to be specified to Sparnatural. This configuration is written in a SHACL specification. This specification encodes both the structure of the knowledge graph (as we want it to be presented to the user), plus some additionnal UI-oriented information, like icons, order of entries, or value selection widgets to use.

How can the SHACL configuration of Sparnatural be produced ? we faced 3 different situations : either we do it manually, or (semi-)automatically, or with an off-the-shelf specification complemented with manual annotations. We will give you a brief description of each possibility below.

Write your SHACL manually (in Excel)

SHACL has one main disadvantage : there is no widely available, free-to-use SHACL editor to create/edit these specs (while Protégé, for example, allows anyone to edit an OWL ontology). To overcome this lack of tool, we are using an Excel-to-SHACL conversion tool, xls2rdf. We (and our users) write the specification in an Excel template, that is then turned in SHACL by an API call to xls2rdf. This is what we described in a previous post.

We have a good documentation on how to design your Sparnatural documentation in Excel, if you want to try it (but see the Hello Sparnatural tutorial first to setup your working test page). And we are here to help if you need to ! (ask questions on the Github repository).

If you are looking for a general-purpose SHACL Excel template, there is one in SHACL Play.

Get your SHACL automatically (statistics included)

SHACL is machine-readable and also writable, so we can use our SHACL generation algorithm to produce a SHACL « profile » of an RDF dataset. The algorithm sends SPARQL queries to identify all classes and properties, with their range, cardinalities, datatype, etc. This SHACL profile can be fed to Sparnatural directly. With an additionnal bonus : it is also very useful to gather statistics of the dataset at this stage. Number of instances of each class, number of occurrences of each property and number of distinct values. Why is it useful ? first because Sparnatural is able to show them in the UI, giving a hint to the user on how many entities of each type exist in the dataset:

second because by knowing how many distinct values exist for a given property, Sparnatural can show either a dropdown list (if there are less than 500) or an autocomplete search field (if there are more).

Also, as a side effect, seeing the profile and statistics of your dataset can help you spot errors (« why does the statistics tells me that this identifier is not always present ? it should be mandatory ! »).

You can also provide the OWL ontology to Sparnatural along with the SHACL specification. Simply pass the 2 files in the configuration of the component. The ontology can be leveraged for 2 things : the hierarchy of classes and properties from the ontology can be used in the Sparnatural UI, and all labels and comments of classes and properties can be read to be displayed in the UI, giving nice defaults if they are not present in the SHACL spec.
This automated SHACL generation was the one used for the Nakala Sparnatural query interface, and here is how the properties look like, you can see it is organized according to the DublinCore properties hierarchy:

Grab an off-the-shelf SHACL specification

Some clients do write and publish SHACL specifications. Yes. This is the case for example for the European Parliament in their open-data portal. They publish the documentation of their data-model with our SHACL Play documentation generator. The specification existed long before we tried Sparnatural on their data. We simply took their specification, loaded it in Sparnatural and… voilà ! it worked seamlessly, but to make it nicer we added in a separate file additionnal icons and tooltips information.

SHACL Synergies

There is a lot of synergies to find with a SHACL configuration in a model-driven perspective. You can publish the documentation, validate data, power your Sparnatural UI, and generate your API JSON schema from the same specification file. And probably much more – what would YOU do with a SHACL specification of your knowledge graph ?

Cet article Sparnatural SHACL configuration : manual, automated, off-the-shelf est apparu en premier sur Sparna Blog.

Sparnatural : say it with SHACL !

Marie Muller — 2024-10-15T16:02:59Z

Do you Sparnatural ? If you follow us here, you may be familiar with our most well-known Sparnatural visual query builder. If not, have a look at the website and give us your impressions on it !

To make it short, Sparnatural is a client-side component that allows non-expert users to explore an RDF Knowledge Graph by building SPARQL queries with little effort.

Fully configurable – and customizable – it can be plugged to any existing SPARQL endpoint, without additional server required to adapt to your knowledge graph ontology.

Innovative and intuitive, it aims at bringing your knowledge graph to your end-users in a visual way that « gamifies » the knowledge graph experience.

Nb : Sparnatural is open source, under a LGPL-3.0 license.

So far, the configuration was made through an OWL ontology…

Sparnatural in SHACL !

… until now,

But the times they are a-changin’ …

You can now configure Sparnatural starting from a SHACL configuration spreadsheet !

SHACL in a nutshell

Defined by a W3C Working Group, SHACL, as for « Shapes Constraint Language », is « a language for validating RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph. »

First published in 2017, it has become a widely used standard to :

describe structural constraints on data graphs ;
validate that data graphs satisfy a set of conditions ;
but also build user interface, generate code and integrate data !

The latter we will leverage for our brand new Sparnatural SHACL configuration.

Yes, in a spreadsheet !

SHACL may be quite unfamiliar for our users.

A good thing is you don’t need to be a SHACL expert to build your SHACL-shaped Sparnatural configuration.

Indeed ! the entire configuration is done via a spreadsheet whose columns correspond to the SHACL model.

Still, you can observe that all the Sparnatural features are here :

the nodes & the edges of the knowledge graph, of course ;
its labels and literal attributes (different kind of notes) ;
but also the Sparnatural search widgets, icons, etc.

Give it a try !

Go to the DBpedia Museums demo :

Navigate the graph

Start with picking up a class from the list and navigate through the properties to another class of the graph, search for a value…

… then click on the arrow to launch the query ▶️

Click on « Toggle SPARQL editor » below the query builder to display the corresponding SPARQL query :

… no need to say that you can create even more elaborate queries, just by adding new parameters when navigating through the knowledge graph !

Sample queries

To give you a quick overview of it, you can also try to launch one of the sample queries we added to the demo.

On the screenshot below we can observe this one is a quite more complex query, using an optional parameter as we noticed that some values happen to be missing on DBpedia, either for Movements or Artworks…

Can we deduce that 19th-Century French women artists records are rather incomplete in English DBpedia ?

We’ll let you investigate on this point.

Multilingual

It is also possible to translate (and display) your configuration in any language of your choice, so that you can showcase your knowledge graph in different languages – even if the graph itself does not contain labels or values with this language…

Here translated in French :

Fully documented

This new version of Sparnatural comes with an extensive documentation of all the features that can be used to date, from basic installation to more advanced configuration of the tool.

Get started with Hello Sparnatural !

Cet article Sparnatural : say it with SHACL ! est apparu en premier sur Sparna Blog.

CORDIS : a SPARQL endpoint is born !

Marie Muller — 2024-01-15T08:55:41Z

Another star to light on EU’s linked open data maturity flag !

Not talking about 2024 exceptional Northern Lights to come, but this one’s also good news for science !

➡️ Late 2023, the Publications Office of the European Union announced on social media the public release of the new CORDIS SPARQL endpoint.

CORDIS, aka « the Community Research and Development Information Service of the European Commission », is « the […] primary source of results from the projects funded by the EU’s framework programmes for research and innovation, from FP1 to Horizon Europe ». Described as a « rich and structured public repository with all project information held by the European Commission such as project factsheets, participants, reports, deliverables and links to open-access publications », the CORDIS catalog has also been made available in 6 European languages by Publications Office’s editorial team.

Cherry on top of a whole process, the CORDIS SPARQL endpoint release comes to crown a long-term linked open data project. The aim identifying, acquiring, preserving and providing access to knowledge in a common will to share with the widest public possible a trust-worthy, qualified and structured information (see Publications Office 2021 Annual Management Report).

In the context of the pandemic (and recent opening of data.europa.eu, the official portal for European data, as defined in 2017–2025 European Open Data Space strategy), the EuroSciVoc taxonomy of fields of science was released April 2020, followed December 2021 by the publishing of European research information ontology (EURIO) on the EU Vocabularies website .

As presented at ENDORSE conference March 2021, the redesign of CORDIS data-model in accordance with Semantic Web standards contributed to bring the platform « from acting as a data repository to finally playing an active role as data provider », where EuroSciVoc taxonomy & EURIO ontology both played key roles in the creation of future CORDIS knowledge graph and SPARQL endpoint :

EuroSciVoc […] is a multilingual, SKOS-XL based taxonomy that represents all the main fields of science that were discovered from the CORDIS content, e.g., project abstracts. It was built starting from the hierarchy of the OECD’s Fields of R&D classification (FoRD) as root and extended through a semi-automatic process based on NLP techniques. It contains almost 1 000 categories in 6 languages (English, French, German, Italian, Polish and Spanish) and each category is enriched with relevant keywords extracted from the textual description of CORDIS projects. It is constantly evolving and is available on EU Vocabularies website […].

In order to transform CORDIS data into Linked Open Data, thus aligning with Semantic Web standards, best practices and tools in industry and public organizations, the need for an ontology emerged. CORDIS created the EURIO (European Research Information Ontology) based on data about research projects funded by the EU’s framework programmes for research and innovation. EURIO is aligned with EU ontologies such as DINGO and FRAPO and de facto standard ontologies such as schema.org and the Organization Ontology from W3C. It models projects, their results and actors such as people and organizations, and includes administrative information like funding schemes and grants.

EURIO, which is available on EU Vocabularies website, was the starting point to develop a Knowledge Graph of CORDIS data that will be publicly available via a dedicated SPARQL endpoint. »

(Enrico Bignotti & Baya Remaoun, « EuroSciVoc taxonomy and EURIO ontology: CORDIS as (semantic) data provider » , ENDORSE March 16, 2021. PDF VIDEO)

… A Knowledge graph that was soon released in 2022-2023 (see INDUSTRY TRACK 1 on Tuesday, 25 October of ISWC 2022 Conference for more detail), until final opening of a CORDIS SPARQL endpoint late november 2023.

Now fancy a few SPARQL queries in there ?

Follow the SPARQL

CORDIS SPARQL endpoint is actually made available on CORDIS Datalab (and already referenced in EU Knowledge Graph among other European SPARQL endpoints ! see the query / see the results)

Here you can access a quick documentation guide to CORDIS Linked Open Data : https://cordis.europa.eu/about/sparql.

Let’s have a look at EURIO ontology first : we need to understand it to query CORDIS knowledge graph.

As we are told in the guide, the latest version can be downloaded on EU Vocabularies website. When we unzip the archive we access the whole documentation about EURIO Classes & properties that we need to write our SPARQL queries – and a diagram of main classes and properties of CORDIS data model :

At first sight we can observe on the schema 3 main groups of entities :

On the top right, the projects & publications associated, key ressources of CORDIS ;
On the top left, the fundings & grants materials, on « monetary » side of the project ;
On the bottom, the organisations & persons implied, with references & coordinates.

Let’s open CORDIS SPARQL endpoint – some easy queries can be run to begin exploring CORDIS knowledge graph.

Nb : the data on SPARQL endpoint is a snapshot, but freshest dumps can be found on European data portal !

Here a simple one to find a list of FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme :

FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme

PREFIX xsd:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT ?fs ?title ?id
WHERE {
# select all funding schemes …
?fs a eurio:FundingScheme.
# … with their title …
?fs eurio:title ?title.
# … and identifier …
?fs eurio:identifier ?id.
# where the identifier contains the regular expression “H2020”
FILTER (REGEX (?id, ‘H2020′))
} LIMIT 100

▶️ See the results

The FILTER REGEX enables us to display the IDs corresponding to H2020 Funding Schemes.

We can make another query to get the projects with the Funding Scheme Programme they are related to (note that, in EURIO a eurio:hasFundingSchemeProgramme is a sub-property of eurio:fundingScheme) :

Projects with the Funding Scheme Programme they are related to

PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
} LIMIT 100

▶️ See the results

(Here we used a property path with a « / » to shorten the query to get the acronyms of projects & Funding Scheme Programmes codes).

… and combining with the first query we can find the projects depending on H2020 Funding Scheme Programme in particular :

Projects depending on H2020 Funding Scheme Programme in particular

PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … with a filter on funding scheme codes ‘H2020′
FILTER REGEX (?fundingscheme, ‘H2020′)
} LIMIT 100

▶️ See the results

It is also possible to get the list of all existing Funding Scheme Programmes CORDIS projects have been funded by – we observe 27 of them here (from the SPARQL endpoint) – while adding a count function to know how many projects per FundingSchemeProgramme :

All existing Funding Scheme Programmes CORDIS projects have been funded by

PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
# count the number of projects by funding scheme programme …
SELECT (COUNT (?project) as ?count) ?fundingscheme
WHERE {
# select the projects with corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … counting projects per funding scheme programme
} GROUP BY ?fundingscheme
LIMIT 100

▶️ See the results

Querying the organisations properties will return other kind of useful informations about geographical location of the projects stakeholders. Let’s figure out we want to find the projects whose coordinating organisations have sites located in France :

Projects whose coordinating organisations have sites located in France

PREFIX skos:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT ?project ?acronym ?role ?organisation ?country
WHERE {
# select the projects with their acronyms …
?project a eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and organisations with ‘coordinator’ role and name …
?project eurio:hasInvolvedParty ?organisationrole.
?organisationrole eurio:roleLabel ?role.
?organisationrole eurio:roleLabel « coordinator ».
?organisationrole eurio:isRoleOf/eurio:legalName ?organisation.
# … with address country for the sites defined at ‘FR’
?organisationrole eurio:isRoleOf/eurio:hasSite/eurio:hasAddress/eurio:addressCountry ?country.
VALUES ?country { ‘FR’ }
} LIMIT 100

▶️ See the results

Depending on available data, you can either query via PostalAddress info (eurio:addressCountry ‘FR’) or AdministrativeArea (eurio:hasGeographicalLocation) … Here we’re lucky as both fields are mandatory ones.

Last but not least, we can also play with CORDIS vocabularies : here you’ll have the choice to investigate via plain keywords of Projects or Publications items, querying titles, abstracts or other types of literals…

An example of projects with abstracts containing string ❄ ‘winter’ ❄ – the URL giving the exact link to the project online :

Looking for ❄ ‘winter’ ❄ in CORDIS projects abstracts (with nice URL to go)

PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT?project ?acronym ?abstract ?url
WHERE {
# select the projects with their acronyms and abstracts …
?project rdf:type eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
?project eurio:abstract ?abstract.
# … with a filter on abstracts containing string ‘winter’ case insensitive …
FILTER (regex(str(?abstract), ‘winter’, ‘i’))
# … generating proper CORDIS website URLs based on RCN project code
?project eurio:rcn ?rcn.
BIND(IRI(CONCAT(‘https://cordis.europa.eu/project/rcn/’, ?rcn)) AS ?url)
} LIMIT 100

▶️ See the results

But funniest way will be using EuroSciVoc taxonomy (and navigating through thesaurus hierarchy) : to do so we need to navigate through property « eurio:hasEuroSciVocClassification » to get the Concepts skosxl:prefLabel property … to finally obtain the thesaurus labels (don’t forget to choose a prefered language with a FILTER (lang parameter) :

Projects with their associated EuroSciVoc keywords (English prefLabels )

PREFIX skosxl:
PREFIX skos:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
SELECT ?project ?acronym ?ESV
WHERE {
# select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … only returning ‘English’ prefLabels
FILTER (lang(?ESV) = ‘en’)
} LIMIT 100

▶️ See the results

A bit more complex one using first level of hierarchy of the taxonomy : here we are searching for all skos:broader concepts « with no other broader concept » (the FILTER NOT EXISTS formula), aka the top concepts or root concepts of the vocabulary used to describe the projects. Then counting the projects by each category :

All root categories of EuroSciVoc used to describe the projects

PREFIX skosxl:
PREFIX skos:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
# count the number of projects by EuroSciVoc top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list with corresponding projects …
?ESV_root ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
} GROUP BY ?ESV_root_label
LIMIT 100

▶️ See the results

… and maybe again more explicit results if refined to level 2 of hierarchy :

All ‘level 2′ root categories of EuroSciVoc used to describe the projects

PREFIX skosxl:
PREFIX skos:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
# count the number of projects by EuroSciVoc level 2 top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label ?ESV_level2_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list level 2 category below level 1 with corresponding projects …
?ESV_root ^skos:broader ?ESV_level2 .
?ESV_level2 ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
?ESV_level2 skosxl:prefLabel/skosxl:literalForm ?ESV_level2_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
FILTER (lang(?ESV_level2_label) = ‘en’)
} GROUP BY ?ESV_root_label ?ESV_level2_label
ORDER BY ?ESV_root_label
LIMIT 100

▶️ See the results

And a little last one with a count, to enumerate most used EuroSciVoc Concepts for indexing projects :

Most used EuroSciVoc Concepts for indexing projects

PREFIX skosxl:
PREFIX skos:
PREFIX eurio:
PREFIX rdf:
PREFIX rdfs:
# count the number of projects by EuroSciVoc Concept …
SELECT (COUNT (?project) as ?count) ?ESV
WHERE {
# … select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … sorting by EuroSciVoc Concept, with English prefLabels
FILTER (lang(?ESV) = ‘en’)
} GROUP BY ?ESV
ORDER BY DESC(?count)
LIMIT 3000

▶️ See the results

This one an ideal one to generate a word cloud maybe ?

What if we send the CSV data to some nice online word cloud generator then ?

(OMG they also have a shooting star shape in there 🤩)

As a conclusion…

According to Science (CORDIS saying !), New Year’s resolutions appear difficult to be held… because most of time too ambitious, restrictive or unprecisely formulated : indeed, « the effectiveness of resolutions depends on how they are framed. »

Horizon 2024, let’s suggest a(n RDF ?) well-framed one : may CORDIS SPARQL endpoint initiative be an example for other structures who want to share Linked Open Data !

Wishing you Best Interoperability and a Very Merry ✨ Sparqling New Year ! ✨

Cet article CORDIS : a SPARQL endpoint is born ! est apparu en premier sur Sparna Blog.

2013-2023 : ‘Tis SKOSPlay!’s Birthday !

Marie Muller — 2023-03-13T14:28:53Z

Hi, it’s Marie (aka chutjetweet here). To be short I’m a documentalist, terminologist, old (linked – open) data maniac & lil’ onto-Padawan and… just came to join Sparna’s team this early January !

To inaugurate my first article on Sparna’s blog, let’s share a little feedback of mine today about Sparna’s well-known SKOSPlay! whose 10 years’ birthday is to celebrate this year !

10 yo, quite a historic tool ! but more than ever actual in a context where the semantic technologies get front of the scene anew due to growing interest shown by the digital humanities movement to data interoperability projects via the standardized knowledge structuration (Wikipedia-Wikidata projects e.g., as semantic wiki devices), and also due to the last progress of artificial intelligence, now able to processing large amount of data and soon fully leveraging the potential of ontologies and knowledge graphs.

From asking for a taxonomy to querying RDF files with an API…

This said, in a more practical way, semantic web standards are not always easy to manipulate as a professional – if non-initiate to SPARQL and nor confirmed data scientist, and even when you have got to deal with a simple structured list of terms !

Either your data is already SKOS-standardized (great !), there sometimes come to have a gap between normalization step and visualization step that requires a bit more technical IT skills. Either – most of time – the common muggle-born is to start with a plain Excel spreadsheet, create a list, add some hierarchy, maybe other scope notes or definitions and… end far puzzled wondering how to get a 5-star data vocabulary ⭐ !

A SKOSPlay!-within-a-SKOSPlay!

Wink to @belett, anything possible now with SKOSPlay!

Aiming at visualizing (and printing !) SKOS thesauri, taxonomies and vocabularies at the very beginning, SKOSPlay! is a full online free and open source tool leveraging semantic technologies (RDF, SPARQL, inference, Linked Data) to generate downloadable HTML or PDF documents. More and more new features have been added since then : alignments display, OWL and SKOS-XL files processing, autocomplete fields and permuted indexes generating …

Hello @veronikaheim, maybe SKOSPlay! could match your need ?

… among other nice and useful developments.

But as an Excel aficionada, the one that I prefer is the Excel-to-RDF converter tool.

One sheet. One import. One result. Easy-peasy, happy terminologist :))

(And you can even keep your custom colors templates and formats !!! 🦄 )

Come on & let’s SKOSPlay!

Let’s figure out you want to display or construct a small vocabulary you could quickly visualize in a standardized SKOS-structured way :

Now to fit in the SKOS model your data has to follow a particular template you can fullfill by downloading on SKOSPlay! website.

First you have to define the header of the template : the global scheme of your vocabulary, its URI, title and description :

Adding the terms of your list (with the URIs)… Here with the “@en” language indication on top of the column as I am to create an English-French multilingual vocabulary :

Recreating the arborescent structure through the Excel template (don’t mind my color palette, I always like colouring my Excel sheets to better visualize the info at a glance !).

The hierarchical broader-narrower structure is to be recreated by adding a “skos:narrower” column (or skos:broader, as you want, with only 1 broader value per line) where you will list the different specific values front of the more generic one (separated by comas). Here I used a PREFIX too in order to shorten my http:// URIs, SKOSPlay! can process them anyway !

Then adding a few notes and other information (multilingual values, skos:notation, any other default properties known in the converter (see the documentation) or different custom elements of yours by adding other PREFIXes :

Your Excel template is ready to go ! quite an easy configuration in my demo here, but SKOSPlay! can also deal with skos:Collections, SKOS-XL and other advanced RDF structures : blank nodes, RDF lists, named graphs. And now possible to generate OWL and SHACL files with the converter too !

Now it’s time to turn your (finally-not-so-dirty-) data into a SKOS-charming file. Take your favorite ~~magic wand~~ SKOSPlay! Excel-to-RDF converter tool and load your Excel file in it (adding some optional parameters if needed).

Well done, it’s a wonderful RDF-ized vocabulary file (here in a Turtle format but you have also RDF/XML, N-Triples, N-Quads, N3 and TriG available) :

Wingardium Visualiza !

We’re almost done. Go back to the website, tab “Play!”, load your last RDF-serialized file and go to the next step to chose the kind of display you want to get, endly press (SKOS)Play! and … abracadataaaaaaa !

Many different options to visualize your arborescent data. Tree, static and dynamic, but also more « professional » and printable sorts of displays like alphabetical, hierarchical or permuted views :

And KWIC (as for « KeyWord In Context ») :

Even possible to load an online Google spreadsheet (mine is shared here), just by adapting a little its URL for the converter’s need. Interesting feature in a collaborative purpose when you are team-building a vocabulary !

The whole pack fully documented and findable on Sparna’s website & Git. Some recent users even produced a short video tutorial to show what they managed to do with different SKOSPlay! visualization tools.

Already knew about SKOSPlay! ? go see his little brother, SHACLPlay! and feel free to give us some feedback in the comments

Happy Birthday SKOSPlay! & Long live Semantic Web !

A bit more Vouvray with your nougat de Tours ?

Cet article 2013-2023 : ‘Tis SKOSPlay!’s Birthday ! est apparu en premier sur Sparna Blog.

Dashboards from SPARQL knowledge graphs using Looker Studio (Google Data Studio)

Thomas Francart — 2022-10-18T13:02:38Z

You want to demonstrate the content of your knowledge graph accessible in SPARQL ? You can easily use dashboard tools, such as Looker studio (formerly Google Data Studio) which require no development and is free to use. Of course, Sparnatural is another possible solution !

This guide will describe every step you need to know in order to create a Looker Studio Dashboard from SPARQL queries. All along, an example will be shown to illustrate all the steps with screenshots, code text and quotes.

Step 1 : Getting the SPARQL Connector

Looker Studio does not provide any native connector for SPARQL. But a community connector exists, called SPARQL Connector, made by Datafabrics LLC, that can be used to create the data source. You can find it by searching for community connectors, or use this link. The code is available in this Github repository.

You have to grant access to your Google account for SPARQL Connector before using it. You will be able to find it in the connectors panel, in the Partner Connectors section, for your next queries.

Step 2 : Connect your knowledge graph

From your report, click on “Add Data” on the bottom right of the screen to open the connector panel. Select the SPARQL Connector in the connector panel (you can also search for it by entering “sparql” in the research field).

Then, follow the steps to create your own data source:

Enter the URL of the SPARQL endpoint (the endpoint must be publicly accessible, without authentication), for example, with DBPedia:

https://dbpedia.org/sparql

Then enter the SPARQL query, for example the following selects countries, their capital city label and their total population:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: 
SELECT ?capital_city_label ?country_label  ?population
WHERE {
?capital_city  dbo:type dbr:Capital_city.
?capital_city rdfs:label ?capital_city_label.
?capital_city dbo:country ?country.
?country rdfs:label ?country_label.
OPTIONAL {?capital_city dbo:populationMetro ?population.}
FILTER (lang(?capital_city_label) = 'en')
FILTER (lang(?country_label) = 'en')
}

For each field on your query, you have to create one field on your data source and select its type. To do so, you have to build a schema like this one:

[{"name": "capital_city_label", "dataType": "STRING"},
    {"name": "country_label", "dataType": "STRING"},
    {"name": "population", "dataType": "NUMBER"}]

Be sure your “name” fields match the fields you have on your query in the same order. You have to select the “dataType” you want for each of your fields, but you can change it later within Google Data Studio. Click here to learn more about data types.

Once every field is completed, you have to click twice on “Add”. If everything goes well, the connector panel will disappear and your new data source will appear on the right of the window and is ready to use. It is defaultly named as “SPARQL Connector”.

If you made a mistake while creating your data source, the SPARQL Connector panel can :

Show an error message, that will indicate you the error type (endpoint, for example)

Do nothing and you will have to check on your schema to be sure everything is correct.

Create a data source as it should do, but Google Data Studio can’t use your data source, and show you this message on your chart :

If you click on “See details” Google Data Studio will show you the error type from the connector :

Step 3 : Transform your data

First, you can change the name of your data source by clicking on the icon on the left of the data source on Google Data Studio (the icon will change into a pencil) to open the data source edition panel.

Then, click on the top left of the new panel where the name of your data source is to modify it.

Change name of the example data source to “Capital city Data (DBpedia)”

You can also change your data source by modifying your parameters in SPARQL Connector. To do so, click on “EDIT CONNECTION”. The SPARQL Connector panel will open with your current parameters and you can modify them.

In the data source edition panel, you can also change the type of your fields so it fits your needs (numbers can be changed as currency, text can be changed as geographic data, etc.).

Be careful of your fields format, you may not be able to use your data anymore. For example, if you have a “,” as a decimal separator, you can change your data type but you won’t be able to use this field as Google Data Studio uses “.” as a decimal separator.

The connector will also apply default values in query results which don’t have a value for a requested field. The default values are 0 for numbers, “” for strings and false for booleans.

The population field on DBpedia has some null values, but the connector transformed all these values into default values (0 for numbers).

You may need to use calculated fields in order to obtain new fields or to transform data. To create one, click on “ADD A FIELD” on the right side of the same panel. Check the following page from the documentation to learn more about calculated fields.

By using a calculated field, the population data can be switched back to the original values.

In the new panel, choose the name of your new field, enter the formula. To ensure your formula is correct, a green check appears at the bottom of the panel. If not, it will turn into a red cross.

Enter the new field name: « population_recalculated ». Then enter the formula of the field : « NULLIF(population,0) ». In this case, if any population value is equal to 0 in the population field, it will turn into a null value in the calculated field.

Step 4 : Improve performance with data extraction

Once you manage to create all your calculated fields, you may have some useless fields in your data source. Those fields may decrease the speed of your dashboard. You can use the “Extract Data” to keep the fields you need in another data source that you will use to make your report.

To use it, click on “Add Data” on the bottom right of the screen and select “Extract Data”.

Then, select your data source and the fields you want to keep in your report. You can make many extractions from one data source if you need.

Choose the data source and keep only 3 fields : “capital_city_label”, “country_label” and “population_recalculated”.

You can also configure the auto-update tool to make sure your extracted data are up to date with the latest version of your data source from SPARQL Connector. In the bottom right of the panel, switch the auto-update button then choose the occurrence of the update (between daily, weekly and monthly).

A data source defaultly named “Extract Data” appears with the fields you selected from the previous data source.

This method only works for data sources, you won’t be able to use it on blended data. Make sure to do the extraction before blending to improve your performance. To learn more about blending, see this page from the Looker Studio documentation.

Step 5 : Create your dashboard

Here is a quick guide on how to create a chart in Google Data Studio. Check the chart reference documentation for more information about charts available by default.

To build a dashboard, you will need to select a widget first (pie chart, table, histograms, etc.). Click on “Add a chart” on the top of the screen and select the one you need.

Click on “Add a chart” and select a pie chart.

Select your chart on the report, it will open a panel on the right side of the screen where you can see the chart type and modify it. You can select the data to display in the “SETUP” panel. You can also customize the chart with the “STYLE” panel.

Place the chart on your dashboard anywhere you want to see it. Google Data Studio will automatically choose the data source and some fields which fit the charts, but you can choose to modify them in the “SETUP” panel on the right.

Choose “capital_city_label” as dimension and “population recalculated” as metric.

Here is the result of this configuration :

In the “STYLE” panel, you can choose to modify some options in the chart to customize it.

Change the number of slices from 10 to 6 to see the 5 top values + others value.

The chart will change automatically with your new parameters as you change them.

Congratulations, you have successfully made your first chart!Try to get your own data sources with SPARQL Connector, make your own dashboards with Looker Studio, and send us the links !

Cet article Dashboards from SPARQL knowledge graphs using Looker Studio (Google Data Studio) est apparu en premier sur Sparna Blog.