Sparna Blog https://blog.sparna.fr Web de données | Architecture de l'information | Accès aux connaissances Tue, 23 Jan 2024 15:55:43 +0000 fr-FR hourly 1 CORDIS : a SPARQL endpoint is born ! https://blog.sparna.fr/2024/01/15/cordis-a-sparql-endpoint-is-born/ https://blog.sparna.fr/2024/01/15/cordis-a-sparql-endpoint-is-born/#comments Mon, 15 Jan 2024 08:55:41 +0000 http://blog.sparna.fr/?p=1615 Another star to light on EU’s linked open data maturity flag ! 🌟 Not talking about 2024 exceptional Northern Lights to come, but this one’s also good news for science ! ➡️ Late 2023, the Publications Office of the European Union announced on social media the public release of the new CORDIS SPARQL endpoint. CORDIS, aka « the Community…

Cet article CORDIS : a SPARQL endpoint is born ! est apparu en premier sur Sparna Blog.

]]>
Another star to light on EU’s linked open data maturity flag ! 🌟

Not talking about 2024 exceptional Northern Lights to come, but this one’s also good news for science !

➡️ Late 2023, the Publications Office of the European Union announced on social media the public release of the new CORDIS SPARQL endpoint.

CORDIS, aka « the Community Research and Development Information Service of the European Commission », is « the […] primary source of results from the projects funded by the EU’s framework programmes for research and innovation, from FP1 to Horizon Europe ». Described as a « rich and structured public repository with all project information held by the European Commission such as project factsheets, participants, reports, deliverables and links to open-access publications », the CORDIS catalog has also been made available in 6 European languages by Publications Office’s editorial team.

Cherry on top 🍒 of a whole process, the CORDIS SPARQL endpoint release comes to crown a long-term linked open data project. The aim identifying, acquiring, preserving and providing access to knowledge in a common will to share with the widest public possible a trust-worthy, qualified and structured information (see Publications Office 2021 Annual Management Report).

In the context of the pandemic (and recent opening of data.europa.eu, the official portal for European data, as defined in 2017–2025 European Open Data Space strategy), the EuroSciVoc taxonomy of fields of science was released April 2020, followed December 2021 by the publishing of European research information ontology (EURIO) on the EU Vocabularies website 🌐.

As presented at ENDORSE conference March 2021, the redesign of CORDIS data-model in accordance with Semantic Web standards contributed to bring the platform « from acting as a data repository to finally playing an active role as data provider », where EuroSciVoc taxonomy & EURIO ontology both played key roles in the creation of future CORDIS knowledge graph and SPARQL endpoint :

🔸 EuroSciVoc […] is a multilingual, SKOS-XL based taxonomy that represents all the main fields of science that were discovered from the CORDIS content, e.g., project abstracts. It was built starting from the hierarchy of the OECD’s Fields of R&D classification (FoRD) as root and extended through a semi-automatic process based on NLP techniques. It contains almost 1 000 categories in 6 languages (English, French, German, Italian, Polish and Spanish) and each category is enriched with relevant keywords extracted from the textual description of CORDIS projects. It is constantly evolving and is available on EU Vocabularies website […].

🔸 In order to transform CORDIS data into Linked Open Data, thus aligning with Semantic Web standards, best practices and tools in industry and public organizations, the need for an ontology emerged. CORDIS created the EURIO (European Research Information Ontology) based on data about research projects funded by the EU’s framework programmes for research and innovation. EURIO is aligned with EU ontologies such as DINGO and FRAPO and de facto standard ontologies such as schema.org and the Organization Ontology from W3C. It models projects, their results and actors such as people and organizations, and includes administrative information like funding schemes and grants.

👉 EURIO, which is available on EU Vocabularies website, was the starting point to develop a Knowledge Graph of CORDIS data that will be publicly available via a dedicated SPARQL endpoint. »

(Enrico Bignotti & Baya Remaoun, « EuroSciVoc taxonomy and EURIO ontology: CORDIS as (semantic) data provider  » , ENDORSE March 16, 2021. PDF VIDEO)

… A Knowledge graph that was soon released in 2022-2023 (see INDUSTRY TRACK 1 on Tuesday, 25 October of ISWC 2022 Conference for more detail), until final opening of a CORDIS SPARQL endpoint late november 2023.

Now fancy a few SPARQL queries in there ?

Follow the SPARQL 💫

CORDIS SPARQL endpoint is actually made available on CORDIS Datalab (and already referenced in EU Knowledge Graph among other European SPARQL endpoints ! see the query / see the results)

Here you can access a quick documentation guide to CORDIS Linked Open Data : https://cordis.europa.eu/about/sparql.

Let’s have a look at EURIO ontology first : we need to understand it to query CORDIS knowledge graph.

As we are told in the guide, the latest version can be downloaded on EU Vocabularies website. When we unzip the archive we access the whole documentation about EURIO Classes & properties that we need to write our SPARQL queries – and a diagram of main classes and properties of CORDIS data model : 

EURIO_v2.4

At first sight we can observe on the schema 3 main groups of entities :

  • On the top right, the projects & publications associated, key ressources of CORDIS ;
  • On the top left, the fundings & grants materials, on « monetary » side of the project ;
  • On the bottom, the organisations & persons implied, with references & coordinates.

Let’s open CORDIS SPARQL endpoint – some easy queries can be run to begin exploring CORDIS knowledge graph.

Nb : the data on SPARQL endpoint is a snapshot, but freshest dumps can be found on European data portal !

Here a simple one to find a list of FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme :

FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?fs ?title ?id
WHERE {
# select all funding schemes …
?fs a eurio:FundingScheme.
# … with their title …
?fs eurio:title ?title.
# … and identifier …
?fs eurio:identifier ?id.
# where the identifier contains the regular expression “H2020”
FILTER (REGEX (?id, ‘H2020′))
} LIMIT 100

▶️ See the results

The FILTER REGEX enables us to display the IDs corresponding to H2020 Funding Schemes.

We can make another query to get the projects with the Funding Scheme Programme they are related to (note that, in EURIO a eurio:hasFundingSchemeProgramme is a sub-property of eurio:fundingScheme) :

Projects with the Funding Scheme Programme they are related to

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
} LIMIT 100

▶️ See the results

(Here we used a property path with a « / » to shorten the query to get the acronyms of projects & Funding Scheme Programmes codes).

… and combining with the first query we can find the projects depending on H2020 Funding Scheme Programme in particular :

Projects depending on H2020 Funding Scheme Programme in particular

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … with a filter on funding scheme codes ‘H2020′
FILTER REGEX (?fundingscheme, ‘H2020′)
} LIMIT 100

▶️ See the results

It is also possible to get the list of all existing Funding Scheme Programmes CORDIS projects have been funded by – we observe 27 of them here (from the SPARQL endpoint) – while adding a count function to know how many projects per FundingSchemeProgramme :

All existing Funding Scheme Programmes CORDIS projects have been funded by

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by funding scheme programme …
SELECT (COUNT (?project) as ?count) ?fundingscheme
WHERE {
# select the projects with corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … counting projects per funding scheme programme
} GROUP BY ?fundingscheme
LIMIT 100

▶️ See the results

Querying the organisations properties will return other kind of useful informations about geographical location of the projects stakeholders. Let’s figure out we want to find the projects whose coordinating organisations have sites located in France :

Projects whose coordinating organisations have sites located in France 🐓

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?role ?organisation ?country
WHERE {
# select the projects with their acronyms …
?project a eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and organisations with ‘coordinator’ role and name …
?project eurio:hasInvolvedParty ?organisationrole.
?organisationrole eurio:roleLabel ?role.
?organisationrole eurio:roleLabel « coordinator ».
?organisationrole eurio:isRoleOf/eurio:legalName ?organisation.
# … with address country for the sites defined at ‘FR’
?organisationrole eurio:isRoleOf/eurio:hasSite/eurio:hasAddress/eurio:addressCountry ?country.
VALUES ?country { ‘FR’ }
} LIMIT 100

▶️ See the results

Depending on available data, you can either query via PostalAddress info (eurio:addressCountry ‘FR’) or AdministrativeArea (eurio:hasGeographicalLocation) … Here we’re lucky as both fields are mandatory ones.

Last but not least, we can also play with CORDIS vocabularies : here you’ll have the choice to investigate via plain keywords of Projects or Publications items, querying titles, abstracts or other types of literals…

An example of projects with abstracts containing string ❄ ‘winter’ ❄ – the URL giving the exact link to the project online :

Looking for ❄ ‘winter’ ❄ in CORDIS projects abstracts (with nice URL to go)

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT?project ?acronym ?abstract ?url
WHERE {
# select the projects with their acronyms and abstracts …
?project rdf:type eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
?project eurio:abstract ?abstract.
# … with a filter on abstracts containing string ‘winter’ case insensitive …
FILTER (regex(str(?abstract), ‘winter’, ‘i’))
# … generating proper CORDIS website URLs based on RCN project code
?project eurio:rcn ?rcn.
BIND(IRI(CONCAT(‘https://cordis.europa.eu/project/rcn/’, ?rcn)) AS ?url)
} LIMIT 100

▶️ See the results

But funniest way will be using EuroSciVoc taxonomy (and navigating through thesaurus hierarchy) : to do so we need to navigate through property « eurio:hasEuroSciVocClassification » to get the Concepts skosxl:prefLabel property … to finally obtain the thesaurus labels (don’t forget to choose a prefered language with a FILTER (lang parameter) :

Projects with their associated EuroSciVoc keywords (English prefLabels 💂)

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?ESV
WHERE {
# select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … only returning ‘English’ prefLabels
FILTER (lang(?ESV) = ‘en’)
} LIMIT 100

▶️ See the results

A bit more complex one using first level of hierarchy of the taxonomy : here we are searching for all skos:broader concepts « with no other broader concept » (the FILTER NOT EXISTS formula), aka the top concepts or root concepts of the vocabulary used to describe the projects. Then counting the projects by each category :

All root categories of EuroSciVoc used to describe the projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list with corresponding projects …
?ESV_root ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
} GROUP BY ?ESV_root_label
LIMIT 100

▶️ See the results

… and maybe again more explicit results if refined to level 2 of hierarchy 👀 :

All ‘level 2′ root categories of EuroSciVoc used to describe the projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc level 2 top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label ?ESV_level2_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list level 2 category below level 1 with corresponding projects …
?ESV_root ^skos:broader ?ESV_level2 .
?ESV_level2 ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
?ESV_level2 skosxl:prefLabel/skosxl:literalForm ?ESV_level2_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
FILTER (lang(?ESV_level2_label) = ‘en’)
} GROUP BY ?ESV_root_label ?ESV_level2_label
ORDER BY ?ESV_root_label
LIMIT 100

▶️ See the results

And a little last one with a count, to enumerate most used EuroSciVoc Concepts for indexing projects :

Most used EuroSciVoc Concepts for indexing projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc Concept …
SELECT (COUNT (?project) as ?count) ?ESV
WHERE {
#  … select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … sorting by EuroSciVoc Concept, with English prefLabels
FILTER (lang(?ESV) = ‘en’)
} GROUP BY ?ESV
ORDER BY DESC(?count)
LIMIT 3000

▶️ See the results

💡This one an ideal one to generate a word cloud maybe ?

What if we send the CSV data to some nice online word cloud generator then ?

Cordis Taxo Cloud

(OMG they also have a shooting star shape 🌠 in there 🤩)

As a conclusion…

According to Science (CORDIS saying !), New Year’s resolutions appear difficult to be held… because most of time too ambitious, restrictive or unprecisely formulated : indeed, « the effectiveness of resolutions depends on how they are framed. »

Horizon 2024, let’s suggest a(n RDF ?) well-framed one : may CORDIS SPARQL endpoint initiative be an example for other structures who want to share Linked Open Data !

Wishing you Best Interoperability and a Very Merry ✨ Sparqling New Year ! ✨

Cet article CORDIS : a SPARQL endpoint is born ! est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2024/01/15/cordis-a-sparql-endpoint-is-born/feed/ 2
2013-2023 : ‘Tis SKOSPlay!’s Birthday ! https://blog.sparna.fr/2023/03/13/2013-2023-tis-skosplays-birthday/ https://blog.sparna.fr/2023/03/13/2013-2023-tis-skosplays-birthday/#comments Mon, 13 Mar 2023 14:28:53 +0000 http://blog.sparna.fr/?p=1540 Hi, it’s Marie (aka chutjetweet here). To be short I’m a documentalist, terminologist, old (linked – open) data maniac & lil’ onto-Padawan and… just came to join Sparna’s team this early January ! To inaugurate my first article on Sparna’s blog, let’s share a little feedback of mine today about Sparna’s well-known SKOSPlay! whose 10 years’ birthday…

Cet article 2013-2023 : ‘Tis SKOSPlay!’s Birthday ! est apparu en premier sur Sparna Blog.

]]>
Hi, it’s Marie (aka chutjetweet here). To be short I’m a documentalist, terminologist, old (linked – open) data maniac & lil’ onto-Padawan and… just came to join Sparna’s team this early January !

To inaugurate my first article on Sparna’s blog, let’s share a little feedback of mine today about Sparna’s well-known SKOSPlay! whose 10 years’ birthday is to celebrate this year !

10 yo, quite a historic tool ! but more than ever actual in a context where the semantic technologies get front of the scene anew due to growing interest shown by the digital humanities movement to data interoperability projects via the standardized knowledge structuration (Wikipedia-Wikidata projects e.g., as semantic wiki devices), and also due to the last progress of artificial intelligence, now able to processing large amount of data and soon fully leveraging the potential of ontologies and knowledge graphs.

image5From asking for a taxonomy to querying RDF files with an API

This said, in a more practical way, semantic web standards are not always easy to manipulate as a professional – if non-initiate to SPARQL and nor confirmed data scientist, and even when you have got to deal with a simple structured list of terms !

Either your data is already SKOS-standardized (great !), there sometimes come to have a gap between normalization step and visualization step that requires a bit more technical IT skills. Either – most of time – the common muggle-born is to start with a plain Excel spreadsheet, create a list, add some hierarchy, maybe other scope notes or definitions and… end far puzzled wondering how to get a 5-star data vocabulary ⭐ !

 

image14

 

A SKOSPlay!-within-a-SKOSPlay!

image3Wink to @belett, anything possible now with SKOSPlay!

Aiming at visualizing (and printing !) SKOS thesauri, taxonomies and vocabularies at the very beginning, SKOSPlay! is a full online free and open source tool leveraging semantic technologies (RDF, SPARQL, inference, Linked Data) to generate downloadable HTML or PDF documents. More and more new features have been added since then : alignments display, OWL and SKOS-XL files processing, autocomplete fields and permuted indexes generating …

image7Hello @veronikaheim, maybe SKOSPlay! could match your need ?

… among other nice and useful developments.

But as an Excel aficionada, the one that I prefer is the Excel-to-RDF converter tool.

One sheet. One import. One result. Easy-peasy, happy terminologist :))

(And you can even keep your custom colors templates and formats !!! 🦄 )

 

Come on & let’s SKOSPlay!

Let’s figure out you want to display or construct a small vocabulary you could quickly visualize in a standardized SKOS-structured way :

image10-2

Now to fit in the SKOS model your data has to follow a particular template you can fullfill by downloading on SKOSPlay! website.

First you have to define the header of the template : the global scheme of your vocabulary, its URI, title and description :

image12

Adding the terms of your list (with the URIs)… Here with the “@en” language indication on top of the column as I am to create an English-French multilingual vocabulary :

image15

Recreating the arborescent structure through the Excel template (don’t mind my color palette, I always like colouring my Excel sheets to better visualize the info at a glance !).

The hierarchical broader-narrower structure is to be recreated by adding a “skos:narrower” column (or skos:broader, as you want, with only 1 broader value per line) where you will list the different specific values front of the more generic one (separated by comas). Here I used a PREFIX too in order to shorten my http:// URIs, SKOSPlay! can process them anyway !

image9

Then adding a few notes and other information (multilingual values, skos:notation, any other default properties known in the converter (see the documentation) or different custom elements of yours by adding other PREFIXes :

image4

Your Excel template is ready to go ! quite an easy configuration in my demo here, but SKOSPlay! can also deal with skos:Collections, SKOS-XL and other advanced RDF structures : blank nodes, RDF lists, named graphs. And now possible to generate OWL and SHACL files with the converter too !

Now it’s time to turn your (finally-not-so-dirty-🐸) data into a SKOS-charming file. Take your favorite magic wand SKOSPlay! Excel-to-RDF converter tool and load your Excel file in it (adding some optional parameters if needed).

image8

Well done, it’s a wonderful RDF-ized vocabulary file (here in a Turtle format but you have also RDF/XML, N-Triples, N-Quads, N3 and TriG available) :

image1

 

Wingardium Visualiza !

We’re almost done. Go back to the website, tab “Play!”, load your last RDF-serialized file and go to the next step to chose the kind of display you want to get, endly press (SKOS)Play! and … abracadataaaaaaa !

image2

Many different options to visualize your arborescent data. Tree, static and dynamic, but also more « professional » and printable sorts of displays like alphabetical, hierarchical or permuted views :

image6

And KWIC (as for « KeyWord In Context ») :

image13

 

Even possible to load an online Google spreadsheet (mine is shared here), just by adapting a little its URL for the converter’s need. Interesting feature in a collaborative purpose when you are team-building a vocabulary !

The whole pack fully documented and findable on Sparna’s website & Git. Some recent users even produced a short video tutorial to show what they managed to do with different SKOSPlay! visualization tools.

Already knew about SKOSPlay! ? go see his little brother, SHACLPlay! and feel free to give us some feedback in the comments :)

Happy Birthday SKOSPlay! & Long live Semantic Web !

A bit more Vouvray with your nougat de Tours ?

image11

Cet article 2013-2023 : ‘Tis SKOSPlay!’s Birthday ! est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2023/03/13/2013-2023-tis-skosplays-birthday/feed/ 0
Sparnatural à SemWeb.pro 2022 le 8 novembre https://blog.sparna.fr/2022/10/31/sparnatural-a-semweb-pro-2022-le-8-novembre/ https://blog.sparna.fr/2022/10/31/sparnatural-a-semweb-pro-2022-le-8-novembre/#comments Mon, 31 Oct 2022 08:57:34 +0000 http://blog.sparna.fr/?p=1536 Le 8 novembre prochain se tient l’évènement semweb.pro 2022. J’y présenterai Sparnatural et les démonstrateurs des Archives Nationales de France et de la Bibliothèque Nationale de France réalisés en 2022. Ce sera un plaisir de vous y voir ! Après la conférence les supports seront à retrouver sur la section « bibliographie » du site. J’en profite…

Cet article Sparnatural à SemWeb.pro 2022 le 8 novembre est apparu en premier sur Sparna Blog.

]]>
Le 8 novembre prochain se tient l’évènement semweb.pro 2022. J’y présenterai Sparnatural et les démonstrateurs des Archives Nationales de France et de la Bibliothèque Nationale de France réalisés en 2022. Ce sera un plaisir de vous y voir !

Après la conférence les supports seront à retrouver sur la section « bibliographie » du site.

J’en profite pour vous indiquer les 2 dernières démos mises au point avec Sparnatural :

Cet article Sparnatural à SemWeb.pro 2022 le 8 novembre est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/10/31/sparnatural-a-semweb-pro-2022-le-8-novembre/feed/ 0
Dashboards from SPARQL knowledge graphs using Looker Studio (Google Data Studio) https://blog.sparna.fr/2022/10/18/dashboards-from-sparql-knowledge-graphs-using-looker-studio-google-data-studio/ https://blog.sparna.fr/2022/10/18/dashboards-from-sparql-knowledge-graphs-using-looker-studio-google-data-studio/#comments Tue, 18 Oct 2022 13:02:38 +0000 http://blog.sparna.fr/?p=1481 You want to demonstrate the content of your knowledge graph accessible in SPARQL ? You can easily use dashboard tools, such as Looker studio  (formerly Google Data Studio) which require no development and is free to use. Of course, Sparnatural is another possible solution ! This guide will describe every step you need to know…

Cet article Dashboards from SPARQL knowledge graphs using Looker Studio (Google Data Studio) est apparu en premier sur Sparna Blog.

]]>
You want to demonstrate the content of your knowledge graph accessible in SPARQL ? You can easily use dashboard tools, such as Looker studio  (formerly Google Data Studio) which require no development and is free to use. Of course, Sparnatural is another possible solution !

This guide will describe every step you need to know in order to create a Looker Studio Dashboard from SPARQL queries. All along, an example will be shown to illustrate all the steps with screenshots, code text and quotes.

Step 1 : Getting the SPARQL Connector 

Looker Studio does not provide any native connector for SPARQL. But a community connector exists, called SPARQL Connector, made by Datafabrics LLC, that can be used to create the data source. You can find it by searching for community connectors, or use this link. The code is available in this Github repository.

You have to grant access to your Google account for SPARQL Connector before using it. You will be able to find it in the connectors panel, in the Partner Connectors section, for your next queries.

Step 2 : Connect your knowledge graph

From your report, click on “Add Data” on the bottom right of the screen to open the connector panel. Select the SPARQL Connector in the connector panel (you can also search for it by entering “sparql” in the research field).1 - article Dashboard

Then, follow the steps to create your own data source:

  • Enter the URL of the SPARQL endpoint (the endpoint must be publicly accessible, without authentication), for example, with DBPedia:
https://dbpedia.org/sparql
  • Then enter the SPARQL query, for example the following selects countries, their capital city label and their total population:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?capital_city_label ?country_label  ?population
WHERE {
?capital_city  dbo:type dbr:Capital_city.
?capital_city rdfs:label ?capital_city_label.
?capital_city dbo:country ?country.
?country rdfs:label ?country_label.
OPTIONAL {?capital_city dbo:populationMetro ?population.}
FILTER (lang(?capital_city_label) = 'en')
FILTER (lang(?country_label) = 'en')
}
  • For each field on your query, you have to create one field on your data source and select its type. To do so, you have to build a schema like this one:
[{"name": "capital_city_label", "dataType": "STRING"},
    {"name": "country_label", "dataType": "STRING"},
    {"name": "population", "dataType": "NUMBER"}]

Be sure your “name” fields match the fields you have on your query in the same order. You have to select the “dataType” you want for each of your fields, but you can change it later within Google Data Studio. Click here to learn more about data types.

2 - article Dashboard

Once every field is completed, you have to click twice on “Add”. If everything goes well, the connector panel will disappear and your new data source will appear on the right of the window and is ready to use. It is defaultly named as “SPARQL Connector”.

3 - article Dashboard

If you made a mistake while creating your data source, the SPARQL Connector panel can :

  • Show an error message, that will indicate you the error type (endpoint, for example)

4 - article Dashboard

  • Do nothing and you will have to check on your schema to be sure everything is correct.
  • Create a data source as it should do, but Google Data Studio can’t use your data source, and show you this message on your chart :

5 - article Dashboard

If you click on “See details” Google Data Studio will show you the error type from the connector :

6 - article Dashboard

Step 3 : Transform your data 

First, you can change the name of your data source by clicking on the icon on the left of the data source on Google Data Studio (the icon will change into a pencil) to open the data source edition panel.

7 - article Dashboard

Then, click on the top left of the new panel where the name of your data source is to modify it.

Change name of the example data source to “Capital city Data (DBpedia)”

8 - article Dashboard

You can also change your data source by modifying your parameters in SPARQL Connector. To do so, click on “EDIT CONNECTION”. The SPARQL Connector panel will open with your current parameters and you can modify them.

In the data source edition panel, you can also change the type of your fields so it fits your needs (numbers can be changed as currency, text can be changed as geographic data, etc.).

Be careful of your fields format, you may not be able to use your data anymore. For example, if you have a “,” as a decimal separator, you can change your data type but you won’t be able to use this field as Google Data Studio uses “.” as a decimal separator.

The connector will also apply default values in query results which don’t have a value for a requested field. The default values are 0 for numbers, “” for strings and false for booleans.

The population field on DBpedia has some null values, but the connector transformed all these values into default values (0 for numbers).

You may need to use calculated fields in order to obtain new fields or to transform data. To create one,  click on “ADD A FIELD” on the right side of the same panel. Check the following page from the documentation to learn more about calculated fields.

By using a calculated field, the population data can be switched back to the original values.

9 - article Dashboard

In the new panel, choose the name of your new field, enter the formula. To ensure your formula is correct, a green check appears at the bottom of the panel. If not, it will turn into a red cross.

Enter the new field name: « population_recalculated ». Then enter the formula of the field : « NULLIF(population,0) ». In this case, if any population value is equal to 0 in the population field, it will turn into a null value in the calculated field. 

10 - article Dashboard

Step 4 : Improve performance with data extraction

Once you manage to create all your calculated fields, you may have some useless fields in your data source. Those fields may decrease the speed of your dashboard. You can use the “Extract Data” to keep the fields you need in another data source that you will use to make your report.

To use it, click on “Add Data” on the bottom right of the screen and select “Extract Data”.

20 - article Dashboard

Then, select your data source and the fields you want to keep in your report. You can make many extractions from one data source if you need. 

Choose the data source and keep only 3 fields : “capital_city_label”, “country_label” and “population_recalculated”.

11 - article Dashboard

You can also configure the auto-update tool to make sure your extracted data are up to date with the latest version of your data source from SPARQL Connector. In the bottom right of the panel, switch the auto-update button then choose the occurrence of the update (between daily, weekly and monthly).

12 - article Dashboard

A data source defaultly named “Extract Data” appears with the fields you selected from the previous data source.

13 - article Dashboard

This method only works for data sources, you won’t be able to use it on blended data. Make sure to do the extraction before blending to improve your performance. To learn more about blending, see this page from the Looker Studio documentation.

Step 5 : Create your dashboard 

Here is a quick guide on how to create a chart in Google Data Studio. Check the chart reference documentation for more information about charts available by default.

To build a dashboard, you will need to select a widget first (pie chart, table, histograms, etc.). Click on “Add a chart” on the top of the screen and select the one you need.

14 - article Dashboard

Click on “Add a chart” and select a pie chart.

15 - article Dashboard

Select your chart on the report, it will open a panel on the right side of the screen where you can see the chart type and modify it. You can select the data to display in the “SETUP” panel. You can also customize the chart with the “STYLE” panel.

Place the chart on your dashboard anywhere you want to see it. Google Data Studio will automatically choose the data source and some fields which fit the charts, but you can choose to modify them in the “SETUP” panel on the right.

Choose “capital_city_label” as dimension and “population recalculated” as metric.

16 - article Dashboard

Here is the result of this configuration :

17 - article Dashboard

In the “STYLE” panel, you can choose to modify some options in the chart to customize it.

Change the number of slices from 10 to 6 to see the 5 top values + others value.

18 - article Dashboard

The chart will change automatically with your new parameters as you change them.

19 - article Dashboard

Congratulations, you have successfully made your first chart!Try to get your own data sources with SPARQL Connector, make your own dashboards with Looker Studio, and send us the links !

Cet article Dashboards from SPARQL knowledge graphs using Looker Studio (Google Data Studio) est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/10/18/dashboards-from-sparql-knowledge-graphs-using-looker-studio-google-data-studio/feed/ 0
Clean JSON(-LD) from RDF using Framing https://blog.sparna.fr/2022/07/20/clean-json-ld-from-rdf-using-framing/ https://blog.sparna.fr/2022/07/20/clean-json-ld-from-rdf-using-framing/#comments Wed, 20 Jul 2022 06:56:36 +0000 http://blog.sparna.fr/?p=1462 Say you have a nice RDF knowledge graph based on an ontology, or maybe reusing ontologies, and maybe you have specified the structure of the knowledge graph with SHACL. And now you would like to expose your RDF in JSON in an API, for the average developer (or maybe you would like to produce a…

Cet article Clean JSON(-LD) from RDF using Framing est apparu en premier sur Sparna Blog.

]]>
Say you have a nice RDF knowledge graph based on an ontology, or maybe reusing ontologies, and maybe you have specified the structure of the knowledge graph with SHACL. And now you would like to expose your RDF in JSON in an API, for the average developer (or maybe you would like to produce a clean JSON to be indexed by Elastic). And the average developer (or Elastic) does not care about RDF and does not care about the “-LD” in “JSON-LD”, he just cares about JSON; and he is right ! we are here to care about the “-LD” part for him.

So what you need is to produce a clean JSON structure from your raw RDF triples. And when I mean “clean”, I mean :

  • no URIs. Nowhere. No URIs in JSON keys, No URIs in types of entities, no URIs in the value of properties controlled by a closed list; the only places where it is acceptable to see a URI are : to give the id of the entities, and when making a reference to such an id of entity within the graph; even in these cases the URIs can be shortened.
  • no fancy JSON-LD keys like @type, @value, @datatype, @id, etc.
  • indented.

You have 2 possibilities to do that :

  1. You develop a custom script, to either generate a JSON export of your data, or to implement the API that will query the knowledge graph, parse the triples, and generate that clean JSON output.
  2. You use JSON-LD framing to automate the production of a clean JSON(-LD) from RDF.

There are 2 nice things about the solution with JSON-LD framing :

  1. it can be automated
  2. you automatically retain the RDF compatibility – because your JSON will necessarily be JSON-LD. This means you can import your nice JSON directly in a triplestore.

The principle of JSON-LD framing is that you provide a JSON-LD @context with an additionnal frame specification that defines how the JSON should be structured (indented), which entity to include at each level (entities can be filtered based on some criteria), and also which properties to include in each entity.

To start with JSON-LD framing, what you need is JSON-LD. Any JSON-LD. Typically the raw JSON-LD serialization that any RDF library or triplestore will produce; that kind of ugly, messy, full-of-URIs-and-@language kind of JSON. So something like:

Capture d’écran du 2022-07-19 17-30-45

(Brrr, scary, no ?)

And then what you need is the JSON-LD playground with the “Framed” tab. This will allow you to test your context and frame specification.

And when deployed in production, what you will need is a JSON-LD library that is capable of implementing the JSON-LD framing algorithm. Implementations are listed here, and you need an implementation compatible with JSON-LD 1.1.

Example files

As an example, I use a JSON-LD file from the French National Library, the one from Les Misérables here : https://data.bnf.fr/fr/13516296/victor_hugo_les_miserables/ (download link at the bottom of the page).

You can download the initial JSON example, the frame specification, and the result in a zip. The zip also contains intermediate frame specifications.

The @context

We’ll start by specifying the JSON-LD context part.

Map @type to type and @id to id

Average developer will wonder what are those @type and @id keys. Re-map them straight away to type and id:

"type" : "@type",
"id" : "@id",

Schema.org and lot of other specifications do that.

What about @graph ?

If you have a named graph at the top, introduced by @graph, my suggestion would be to simply remap it to a fixed key, like « data », or « entities » :

"data" : "@graph",

Map RDF properties URIs to JSON keys

Get rid of any trace of URI or short URIs in JSON keys. Declare a term for every property in your graph. The simplest way to do this is to use the local part of the URI (after last “#” or “/”) as the term. Order the context by the alphabetical order of the terms. Terms for properties will usually start with a lowercase letter.

In corner cases you may end up with the same term (such as in the example bnf-onto:subject and dcterms:subject), so in that case you need a different key, I chose “bnf-subject” here for bnf-onto:subject and kept “subject” for dcterms:subject.

"creator" : "dcterms:creator",
"date" : "dcterms:date",
"dateOfWork" : "rdagroup1elements:dateOfWork",
"depiction" : "foaf:depiction",
"description" : "dcterms:description",

Map classes URIs to JSON terms

Now you want to do the same thing to get rid of any trace of URIs in the “type” of entities. Declare a term for every class in your ontology/application profile. List the classes in a different section than the properties. Terms for classes will usually start with an uppercase.

"Concept" : "skos:Concept",
"Document" : "foaf:Document",
"ExpositionVirtuelle" : "bnf-onto:ExpositionVirtuelle",

 Declare object properties with “@type”: “@id”

Now you want to get rid of all those ugly “id”, we are only interested in listing the values. To do that, modify the mapping of the property (here “depiction”) to state its values are URIs. You need to change the mapping from

"depiction" : "foaf:depiction",

to

"depiction" : { "@value" : "foaf:depiction", "@type":"@id" },

And so parts like this :

"depiction": [
{
"id": "https://gallica.bnf.fr/ark:/12148/btv1b8438568p.thumbnail"
},
{
"id": "https://gallica.bnf.fr/ark:/12148/btv1b9004781d.thumbnail"
},
{
"id": "https://gallica.bnf.fr/ark:/12148/bpt6k5545348q.thumbnail"
}
]

Will be turned into

"depiction": [
"https://gallica.bnf.fr/ark:/12148/btv1b8438568p.thumbnail",
"https://gallica.bnf.fr/ark:/12148/btv1b9004781d.thumbnail",
"https://gallica.bnf.fr/ark:/12148/bpt6k5545348q.thumbnail",
"https://gallica.bnf.fr/ark:/12148/btv1b8438570r.thumbnail"
]

Map datatypes

Now you want to get rid of the @datatype information for literals. If the value of a property always uses the same datatype, which is the case 99,9% of the time, then you can change the mapping from

"property" : "http://myproperty",

to

"property" : { “@id”: "http://myproperty", “@type”:”xsd:date” }

(The example used does not have datatype properties.)

Map languages, with fixed language or when multilingual

Now let’s get rid of the @language. For this you have 2 choices : when the language is always the same for the value, you can indicate it in the context, the same way that you would do for the datatype but with the @language key. So you change from

"description" : "dcterms:description",

to

"description" : { “@id” : "dcterms:description", “@language” : “fr” }

You could even have different terms for different languages, such as :

"title_fr" : { "@id" : "dcterms:title", "@language" : "fr" },
"title_en" : { "@id" : "dcterms:title", "@language" : "en" },
"title" : { "@id" : "dcterms:title" },

or when you have multilingual multiple values, you can make the property a language map by declaring it this way:

"editorialNote" : { "@id" : "skos:editorialNote", "@container" : "@language" },

Which will turn the language code as a key in the JSON output:

"editorialNote": {
"fr": [
"BN Cat. gén. (sous : Hugo, comte Victor-Marie) : Les misérables. - . - BN Cat. gén. 1960-1969 (sous : Hugo, Victor) : idem. - . -",
"Laffont-Bompiani, Oeuvres, 1994. - . - GDEL. - . -"
] },

In that case, watch out for cases where there is a value without language, it will generate a @none key.

Map controlled list values to JSON terms

By now you already get a much cleaner JSON and almost all “unnecessary” URIs have disappeared. But we still have some URI references that we can clean up : the ones that are references to controlled lists with a finite number of values.

We can declare term mappings for those values just like we did to map properties and classes. BUT – and this is the trick, we need to change the property declaration from “@id” to “@vocab” for the replacement to happen. This is documented in the « Type coercion » section of the spec.

In our example, the mapping to languages and subjects are good candidates to be mapped to JSON terms. So we change

"language" : { "@id" : "dcterms:language", "@type":"@id" },
"subject" : { "@id" : "dcterms:subject", "@type":"@id" },

to

"language" : { "@id" : "dcterms:language", "@type":"@vocab" },
"subject" : { "@id" : "dcterms:subject", "@type":"@vocab" },
“fre” : “http://id.loc.gov/vocabulary/iso639-2/fre”,
“eng” : “http://id.loc.gov/vocabulary/iso639-2/eng”,

Shorten remaining URI references

Now the only URIs left are the ids of the main entities in our graph, and references to those ids. Reference to controlled vocabularies with a limited number of values have been mapped to JSON terms. Although we cannot turn all the remaining URIs to JSON terms (because we can’t declare all possible entity URIs in the context), we can shorten them by adding a prefix mapping in the context, in our case:

"ark-https": "https://data.bnf.fr/ark:/12148/",

(I note that there are http:// and https:// URIs in the data, I don’t know why)

 

The frame specification

So now we have clean values, no URIs, no fancy JSON-LD keys. But we still don’t have a structure indented the way the average developer would expect it; and this is where the frame specification comes into play.

Define indentation and filters (and reverse properties if needed)

The frame specification acts as both a filter/selection mechanism and as a structure definition. At each level you indicate the criterias for the object to be included. In our example we have a skos:Concept (the entry in the library catalog) that is foaf:focus a Work (the Book « in the real world »), and that skos:Concept is the subject of many virtual exhibits. We want to have the Concept and the Work at first level, and under the concept the exhibits. But there is a trick : it is the virtual exhibits that points to the concept with a dcterms:subject, and we want it the other direction : Concept is_subject_of Exhibit, so we need a @reverse property.

To do that, add the following reverse mapping declaration: (don’t modify the existing one):

"subject_of" : { "@reverse" : "dcterms:subject" },

Note the use of « @reverse » to indicate that JSON key is to be interpreted from object to subject when turned into triples.

With that in place, we can write our frame specification, which goes right after the @context we have designed before:

"type" : ["Concept", "Work"],
"subject_of" : {
"type" : "ExpositionVirtuelle"
}

Note how we use the terms defined in the context previously. This is to be understood the following way : « at the first level, take any entity with a type of either Concept or Work, then insert a subject_of key and put inside any value that has a type ExpositionVirtuelle ». This garantees the virtual exhibits objects will go under the Concept, and not above or at the same level. But this is not sufficient, as you will notice if you apply that framing that the Work is repeated under the « focus » property of the Concept, and at the root level. This is because of the default behavior of the JSON-LD playground regarding object embedding (objects are always embeded when they are referenced)

Avoid embedding

To avoid embedding when it is undesired, we can set the « @embed » option to « @never » on the « focus » property, like so :

"type" : ["Concept", "Work"],
"subject_of" : {
"type" : "ExpositionVirtuelle"
},
"focus" : {
"@embed" : "@never",
"@omitDefault": true
}

This tells the framing algorithm to never embed the complete entity inside the focus property, just reference the URI instead.

Also, you will notice the use of « @omitDefault » to true; this tells the framing algorithm to omit the focus property when it has no value. Otherwise, since the Work does not have a foaf:focus property (only the Concept), then it will get a « focus » key set to null.

What about order of keys in the JSON ?

Well, I am sure this can be controlled, either by specifying explicitely all the keys you want, in the order you want them, in the frame specification, or by using an « ordered » parameter to the JSON-LD API, but that is not available in the playground.

If you list all keys explicitely in the frame specification, don’t forget to use wildcards so that any value will match; wildcards are empty objects with « {} »:

"myProperty" : {}

The result

Capture d’écran du 2022-07-20 08-01-19

Much nicer no ? This is something you can put into the hand of an average developer.

Automate context generation from SHACL

Do you have a SHACL specification of the structure of your graph ? wouldn’t it be nice to automate the generation of the JSON-LD context from SHACL ? Maybe we could do that in SHACL-Play ? stay tuned !

Probably what we can automate is the context part, which can be global and unique for all your graph, but the framing specification should probably be different for each different API you need; each framing specification will then reference the same context by its URL.

Image : [Encadrement ornemental] ([1er état]) / .Io. MIGon 1544. [Jean Mignon] ; [d’après Le Primatice] https://gallica.bnf.fr/ark:/12148/btv1b53230250h

Cet article Clean JSON(-LD) from RDF using Framing est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/07/20/clean-json-ld-from-rdf-using-framing/feed/ 7
Evènement : Sparnatural et les démonstrateurs Archives Nationales + BNF https://blog.sparna.fr/2022/05/24/evenement-sparnatural-demonstrateurs-archives-nationales-bnf/ https://blog.sparna.fr/2022/05/24/evenement-sparnatural-demonstrateurs-archives-nationales-bnf/#comments Tue, 24 May 2022 07:48:02 +0000 http://blog.sparna.fr/?p=1456 Les Archives nationales, la BnF et le Département du Numérique pour la transformation des politiques culturelles et l’administration des données (DEPNUM) du ministère de la Culture se sont associés en 2021 pour mener à bien un projet visant deux objectifs : développer une nouvelle version de Sparnatural (http://sparnatural.eu/), un éditeur open source de requêtes SPARQL (qui…

Cet article Evènement : Sparnatural et les démonstrateurs Archives Nationales + BNF est apparu en premier sur Sparna Blog.

]]>
Les Archives nationales, la BnF et le Département du Numérique pour la transformation des politiques culturelles et l’administration des données (DEPNUM) du ministère de la Culture se sont associés en 2021 pour mener à bien un projet visant deux objectifs :

  1. développer une nouvelle version de Sparnatural (http://sparnatural.eu/), un éditeur open source de requêtes SPARQL (qui est le langage utilisé pour faire des recherches dans des graphes RDF) ;
  2. mettre en place deux démonstrateurs web pleinement opérationnels, pour permettre l’exploration et la recherche intuitives dans des graphes de métadonnées culturelles grâce à des interfaces construites à l’aide de cet outil.

La BnF a donc construit avec Sparna un démonstrateur web pour interroger l’ensemble de ses données RDF, soit tout le contenu de data.bnf.fr. Les Archives nationales (le Lab, en concertation avec le Département du Minutier central des notaires des Paris) ont construit avec Sparna un démonstrateur web qui permet d’interroger les métadonnées décrivant un tiers des archives notariales conservées aux Archives nationales, qu’elles ont préalablement converties en RDF conformément à l’ontologie RiC-O.

Une demi-journée le 17 juin après-midi permettra de présenter en détail les travaux réalisés et les résultats obtenus, puis d’évoquer les perspectives qu’ils ouvrent pour les institutions partenaires et pour d’autres projets.

Elle aura lieu à l’auditorium des Archives nationales sur le site de Pierrefitte-sur-Seine (voir https://www.archives-nationales.culture.gouv.fr/fr/web/guest/site-de-pierrefitte-sur-seine), et sera également accessible à distance via une plateforme de visioconférence.

Le programme de la demi-journée de présentation du projet Sparnatural est désormais téléchargeable depuis l’agenda des Archives nationales (https://www.archives-nationales.culture.gouv.fr/fr/web/guest/235?sia-agenda-parameter=0010111111111) ou directement à https://www.archives-nationales.culture.gouv.fr/documents/10157/277814/Programme+Sparnatural+2022/.

La demi-journée est accessible à tous gratuitement, sur inscription. Pour vous inscrire vous pouvez, soit utiliser le formulaire disponible à https://framaforms.org/inscription-a-la-demi-journee-de-restitution-du-projet-sparnatural-17-juin-2022-1652342598), soit écrire à l’adresse  le-lab.archives-nationales@culture.gouv.fr en précisant si vous viendrez sur place ou si vous participerez à distance.

Cet article Evènement : Sparnatural et les démonstrateurs Archives Nationales + BNF est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/05/24/evenement-sparnatural-demonstrateurs-archives-nationales-bnf/feed/ 1
Supports de formation SPARQL CC-BY-SA https://blog.sparna.fr/2022/04/19/supports-de-formation-sparql-cc-by-sa/ https://blog.sparna.fr/2022/04/19/supports-de-formation-sparql-cc-by-sa/#comments Tue, 19 Apr 2022 20:55:13 +0000 http://blog.sparna.fr/?p=1452  

Cet article Supports de formation SPARQL CC-BY-SA est apparu en premier sur Sparna Blog.

]]>
 

Cet article Supports de formation SPARQL CC-BY-SA est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/04/19/supports-de-formation-sparql-cc-by-sa/feed/ 0
Vidéo Sparnatural : prototypes BNF et AN https://blog.sparna.fr/2022/03/04/video-sparnatural-prototypes-bnf-et-an/ https://blog.sparna.fr/2022/03/04/video-sparnatural-prototypes-bnf-et-an/#comments Fri, 04 Mar 2022 18:02:44 +0000 http://blog.sparna.fr/?p=1446 A l’occasion de l’évènement Europeana « Building the common European data space for cultural heritage together » le 1er mars 2022, j’ai eu l’occasion de montrer cette vidéo de présentation du projet Sparnatural en cours avec les Archives Nationales et la Bibliothèque Nationale : Les démonstrateurs qui sont présentés dans cette ne sont pas finalisés (en date…

Cet article Vidéo Sparnatural : prototypes BNF et AN est apparu en premier sur Sparna Blog.

]]>
A l’occasion de l’évènement Europeana « Building the common European data space for cultural heritage together » le 1er mars 2022, j’ai eu l’occasion de montrer cette vidéo de présentation du projet Sparnatural en cours avec les Archives Nationales et la Bibliothèque Nationale :

Les démonstrateurs qui sont présentés dans cette ne sont pas finalisés (en date du 01/03/2022). D’autres publications, évènements et communication devraient suivre à la fin de ce projet, à partir mai 2022.

 

Cet article Vidéo Sparnatural : prototypes BNF et AN est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2022/03/04/video-sparnatural-prototypes-bnf-et-an/feed/ 0
Fair Data Collective is doing cool things with SKOS Play and xls2rdf https://blog.sparna.fr/2021/06/30/fair-data-collective-is-doing-cool-things-with-skos-play-and-xls2rdf/ https://blog.sparna.fr/2021/06/30/fair-data-collective-is-doing-cool-things-with-skos-play-and-xls2rdf/#comments Wed, 30 Jun 2021 08:34:52 +0000 http://blog.sparna.fr/?p=1440 The FAIR Data Collective is doing cool things to enable researchers to easily publish their vocabularies as SKOS linked data while easily editing the vocabulary content in Excel spreadsheets, converted using the xls2rdf library in SKOS Play from Sparna. They turned the converter in a Github actions pipeline, so that you push your Excel spreadsheet…

Cet article Fair Data Collective is doing cool things with SKOS Play and xls2rdf est apparu en premier sur Sparna Blog.

]]>
The FAIR Data Collective is doing cool things to enable researchers to easily publish their vocabularies as SKOS linked data while easily editing the vocabulary content in Excel spreadsheets, converted using the xls2rdf library in SKOS Play from Sparna. They turned the converter in a Github actions pipeline, so that you push your Excel spreadsheet based on a provided Excel template to your Github repo, and abracadabra ! you get a SKOS RDF file that can be loaded in a Fuseki instance and visible in Skosmos, and even submitted to BioPortal or OntoPortal.

Here is also nice video showing how to visualize such a SKOS vocabulary in SKOS Play visualization tools.

Thanks to Nikola Vasiljevic and John Graybeal from FAIR Data Collective for this nice integration !

You can check out the Fair Data Collective page on LinkedIn : « Making practical and easy-to-use FAIR data solutions ».

Cet article Fair Data Collective is doing cool things with SKOS Play and xls2rdf est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2021/06/30/fair-data-collective-is-doing-cool-things-with-skos-play-and-xls2rdf/feed/ 1
Alimenter Talend avec SPARQL (sur Wikidata) https://blog.sparna.fr/2021/03/24/alimenter-talend-avec-sparql-sur-wikidata/ https://blog.sparna.fr/2021/03/24/alimenter-talend-avec-sparql-sur-wikidata/#comments Wed, 24 Mar 2021 09:40:55 +0000 http://blog.sparna.fr/?p=1409 Dans le billet précédent nous avons vu comment Talend pouvait être utilisé pour convertir des données existantes vers du RDF/XML pour alimenter un graphe de connaissances. Ici nous allons voir… exactement l’inverse ! Comment alimenter Talend avec une requête SPARQL ? En d’autres termes comment votre graphe de connaissances RDF pourra servir d’entrée à un…

Cet article Alimenter Talend avec SPARQL (sur Wikidata) est apparu en premier sur Sparna Blog.

]]>
Dans le billet précédent nous avons vu comment Talend pouvait être utilisé pour convertir des données existantes vers du RDF/XML pour alimenter un graphe de connaissances. Ici nous allons voir… exactement l’inverse ! Comment alimenter Talend avec une requête SPARQL ? En d’autres termes comment votre graphe de connaissances RDF pourra servir d’entrée à un traitement de conversion de données pour exporter des données tabulaires, alimenter d’autres bases, ou se combiner avec d’autres flux.

Le principe est simple : arriver à exécuter une requête SPARQL puis traiter les résultats correspondants pour en faire un tableau de données. Ce tableau de données pourra ensuite être exporté, combiné, enregistré, comme vous le souhaitez.

Pour illustrer cela nous allons interroger Wikidata au travers de son service d’interrogation SPARQL en utilisant sa première requête d’exemple qui récupère… les chats !

La requête est la suivante, et voici le lien direct pour l’exécuter dans Wikidata :

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P31 wd:Q146.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10

Vous pouvez télécharger le job présenté ici dans ce repository Github d’exemple, et l’importer directement dans Talend.

Nous allons utiliser les composants Talend suivant :

1_Flow

Création d’un Job

Pour commencer, vous devez créer un nouveau job.

  • Faite un clic droit sur Jobs dans le panneau de gauche. choisissez l’option Créer un Job.

2_CréerJob

  •  Remplissez les champs nécessaires de la fenêtre Nouveau Job et cliquez sur le bouton finish.

3_ParametrerJob

 Appel SPARQL avec tRESTClient

  • Ajoutez au job le composant tRESTClient.
  • Cliquez deux fois sur le composant tRESTClient, allez à la propriété Paramètres simple et remplissez les paramètres suivants:
    1. URL: L’URL du service SPARQL de Wikidata est https://query.wikidata.org/sparql .
    2. Méthode HTTP: Choisissez la méthode HTTP GET
    3. Paramètre de la Requête: Nous devons ajouter le paramètre “query”, cliquez sur le bouton plus [+], et dans la colonne Nom entrez “query”. Dans la colonne Valeur vous allez saisir la requête SPARQL.

Attention!  La requête SPARQL est une chaîne de caractères Java, vous devez donc : 1/ L’entourer avec des guillemets doubles 2/ ajouter le caractère d’échappement \ avant les guillemets dans la requête et 3/ écrire la requête sur une seule ligne. Voici la chaîne de caractères correspondante :

“SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q146. SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". } } LIMIT 10”

 

4_ParametrerComponentSource

Transformation des résultats SPARQL avec tExtractXMLField

  • Ajoutez dans le projet un composant tExtractXMLField.
  • Connectez le composant tRESTClient au tExtractXMLField.

Nous allons paramétrer le tExtractXMLField :

5_tExtractXMLField_Colonne

  1. Cliquez sur le bouton Modifier le schéma pour ouvrir la fenêtre des colonnes d’entrée et sortie du composant tExtractXMLField.
  2. Ajoutez deux nouvelles colonnes Uri et Label de type String avec le bouton plus [+] et cliquez sur le bouton Ok. 6_tExtractXMLField_Colonne_sortie
  3. Allez à Paramètres simple et modifiez les options suivantes:
  • Champ XML: Choisissez le champs “body”, qui est le champ qui contient la réponse à l’appel SPARQL du composant précédent;
  • Requête XPath boucle: Saisissez « /sparql/results/result » qui est d’après la spec du format de résultat SPARQL le chemin vers chaque ligne de résultats dans la réponse.
  • La table Mapping : c’est ici que tout se joue !!! ce mapping va vous permettre d’associer les colonnes de votre résultat SPARQL au champs de sortie du composant, par le biais de chemins XPath:
    1. Pour la colonne Uri la valeur de la colonne Requête Xpath sera « binding[@name=’item’]/uri »
    2. pour la colonne Label la valeur de la colonne Requête Xpath sera « binding[@name=’itemLabel’]/literal ».
    3. Si la requête SPARQL retournait plus de colonnes, il faudrait ajouter ici les mappings correspondants pour alimenter les autres colonnes du résultat.

7_tExtractXMLField_Component

 Génération du fichier de sortie

  • Ajoutez le composant de sortie tFileOutputDelimited.
  • Connectez le composant tExtractXMLField au composant tFileOutputDelimited.
  • Paramétrez le composant dans la section de Paramètres simpleNom de fichier: le chemin dans lequel vous souhaitez sauvegarder le fichier de sortie.

8_tFileOutputDelimited

 Lancer le Job

  • Allez à la section Exécuter.
  • Cliquez sur le bouton Exécuter.

9_LancerJob

 Profitez !

Naviguez vers l’emplacement du fichier pour le récupérer.

10_RouteFile

Et voilà le résultat :

 

Uri;Label
 http://www.wikidata.org/entity/Q378619;CC
 http://www.wikidata.org/entity/Q498787;Muezza
 http://www.wikidata.org/entity/Q677525;Orangey
 http://www.wikidata.org/entity/Q851190;Mrs. Chippy
 http://www.wikidata.org/entity/Q1050083;Catmando
 http://www.wikidata.org/entity/Q1201902;Tama
 http://www.wikidata.org/entity/Q1207136;Dewey Readmore Books
 http://www.wikidata.org/entity/Q1371145;Socks
 http://www.wikidata.org/entity/Q1386318;F. D. C. Willard
 http://www.wikidata.org/entity/Q1413628;Nora

Vous savez donc maintenant comment alimenter Talend à partir d’une base accessible en SPARQL, en quelques clics et sans code ! Cela permet de valoriser votre graphe de connaissances pour l’intégrer dans le reste du système d’information.

Cet article Alimenter Talend avec SPARQL (sur Wikidata) est apparu en premier sur Sparna Blog.

]]>
https://blog.sparna.fr/2021/03/24/alimenter-talend-avec-sparql-sur-wikidata/feed/ 0