Un billet publié récemment par Christian Fauré comparant les approches…
CORDIS : a SPARQL endpoint is born !
Another star to light on EU’s linked open data maturity flag ! 🌟
Not talking about 2024 exceptional Northern Lights to come, but this one’s also good news for science !
➡️ Late 2023, the Publications Office of the European Union announced on social media the public release of the new CORDIS SPARQL endpoint.
CORDIS, aka « the Community Research and Development Information Service of the European Commission », is « the […] primary source of results from the projects funded by the EU’s framework programmes for research and innovation, from FP1 to Horizon Europe ». Described as a « rich and structured public repository with all project information held by the European Commission such as project factsheets, participants, reports, deliverables and links to open-access publications », the CORDIS catalog has also been made available in 6 European languages by Publications Office’s editorial team.
Cherry on top 🍒 of a whole process, the CORDIS SPARQL endpoint release comes to crown a long-term linked open data project. The aim identifying, acquiring, preserving and providing access to knowledge in a common will to share with the widest public possible a trust-worthy, qualified and structured information (see Publications Office 2021 Annual Management Report).
In the context of the pandemic (and recent opening of data.europa.eu, the official portal for European data, as defined in 2017–2025 European Open Data Space strategy), the EuroSciVoc taxonomy of fields of science was released April 2020, followed December 2021 by the publishing of European research information ontology (EURIO) on the EU Vocabularies website 🌐.
As presented at ENDORSE conference March 2021, the redesign of CORDIS data-model in accordance with Semantic Web standards contributed to bring the platform « from acting as a data repository to finally playing an active role as data provider », where EuroSciVoc taxonomy & EURIO ontology both played key roles in the creation of future CORDIS knowledge graph and SPARQL endpoint :
🔸 EuroSciVoc […] is a multilingual, SKOS-XL based taxonomy that represents all the main fields of science that were discovered from the CORDIS content, e.g., project abstracts. It was built starting from the hierarchy of the OECD’s Fields of R&D classification (FoRD) as root and extended through a semi-automatic process based on NLP techniques. It contains almost 1 000 categories in 6 languages (English, French, German, Italian, Polish and Spanish) and each category is enriched with relevant keywords extracted from the textual description of CORDIS projects. It is constantly evolving and is available on EU Vocabularies website […].
🔸 In order to transform CORDIS data into Linked Open Data, thus aligning with Semantic Web standards, best practices and tools in industry and public organizations, the need for an ontology emerged. CORDIS created the EURIO (European Research Information Ontology) based on data about research projects funded by the EU’s framework programmes for research and innovation. EURIO is aligned with EU ontologies such as DINGO and FRAPO and de facto standard ontologies such as schema.org and the Organization Ontology from W3C. It models projects, their results and actors such as people and organizations, and includes administrative information like funding schemes and grants.
👉 EURIO, which is available on EU Vocabularies website, was the starting point to develop a Knowledge Graph of CORDIS data that will be publicly available via a dedicated SPARQL endpoint. »
(Enrico Bignotti & Baya Remaoun, « EuroSciVoc taxonomy and EURIO ontology: CORDIS as (semantic) data provider » , ENDORSE March 16, 2021. PDF VIDEO)
… A Knowledge graph that was soon released in 2022-2023 (see INDUSTRY TRACK 1 on Tuesday, 25 October of ISWC 2022 Conference for more detail), until final opening of a CORDIS SPARQL endpoint late november 2023.
Now fancy a few SPARQL queries in there ?
Follow the SPARQL 💫
CORDIS SPARQL endpoint is actually made available on CORDIS Datalab (and already referenced in EU Knowledge Graph among other European SPARQL endpoints ! see the query / see the results)
Here you can access a quick documentation guide to CORDIS Linked Open Data : https://cordis.europa.eu/about/sparql.
Let’s have a look at EURIO ontology first : we need to understand it to query CORDIS knowledge graph.
As we are told in the guide, the latest version can be downloaded on EU Vocabularies website. When we unzip the archive we access the whole documentation about EURIO Classes & properties that we need to write our SPARQL queries – and a diagram of main classes and properties of CORDIS data model :
At first sight we can observe on the schema 3 main groups of entities :
- On the top right, the projects & publications associated, key ressources of CORDIS ;
- On the top left, the fundings & grants materials, on « monetary » side of the project ;
- On the bottom, the organisations & persons implied, with references & coordinates.
Let’s open CORDIS SPARQL endpoint – some easy queries can be run to begin exploring CORDIS knowledge graph.
Nb : the data on SPARQL endpoint is a snapshot, but freshest dumps can be found on European data portal !
Here a simple one to find a list of FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme :
FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> |
The FILTER REGEX enables us to display the IDs corresponding to H2020 Funding Schemes.
We can make another query to get the projects with the Funding Scheme Programme they are related to (note that, in EURIO a eurio:hasFundingSchemeProgramme is a sub-property of eurio:fundingScheme) :
Projects with the Funding Scheme Programme they are related to
PREFIX eurio: <http://data.europa.eu/s66#> |
(Here we used a property path with a « / » to shorten the query to get the acronyms of projects & Funding Scheme Programmes codes).
… and combining with the first query we can find the projects depending on H2020 Funding Scheme Programme in particular :
Projects depending on H2020 Funding Scheme Programme in particular
PREFIX eurio: <http://data.europa.eu/s66#> |
It is also possible to get the list of all existing Funding Scheme Programmes CORDIS projects have been funded by – we observe 27 of them here (from the SPARQL endpoint) – while adding a count function to know how many projects per FundingSchemeProgramme :
All existing Funding Scheme Programmes CORDIS projects have been funded by
PREFIX eurio: <http://data.europa.eu/s66#> |
Querying the organisations properties will return other kind of useful informations about geographical location of the projects stakeholders. Let’s figure out we want to find the projects whose coordinating organisations have sites located in France :
Projects whose coordinating organisations have sites located in France 🐓
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> |
Depending on available data, you can either query via PostalAddress info (eurio:addressCountry ‘FR’) or AdministrativeArea (eurio:hasGeographicalLocation) … Here we’re lucky as both fields are mandatory ones.
Last but not least, we can also play with CORDIS vocabularies : here you’ll have the choice to investigate via plain keywords of Projects or Publications items, querying titles, abstracts or other types of literals…
An example of projects with abstracts containing string ❄ ‘winter’ ❄ – the URL giving the exact link to the project online :
Looking for ❄ ‘winter’ ❄ in CORDIS projects abstracts (with nice URL to go)
PREFIX eurio: <http://data.europa.eu/s66#> |
But funniest way will be using EuroSciVoc taxonomy (and navigating through thesaurus hierarchy) : to do so we need to navigate through property « eurio:hasEuroSciVocClassification » to get the Concepts skosxl:prefLabel property … to finally obtain the thesaurus labels (don’t forget to choose a prefered language with a FILTER (lang parameter) :
Projects with their associated EuroSciVoc keywords (English prefLabels 💂)
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#> |
A bit more complex one using first level of hierarchy of the taxonomy : here we are searching for all skos:broader concepts « with no other broader concept » (the FILTER NOT EXISTS formula), aka the top concepts or root concepts of the vocabulary used to describe the projects. Then counting the projects by each category :
All root categories of EuroSciVoc used to describe the projects
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#> |
… and maybe again more explicit results if refined to level 2 of hierarchy 👀 :
All ‘level 2′ root categories of EuroSciVoc used to describe the projects
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#> |
And a little last one with a count, to enumerate most used EuroSciVoc Concepts for indexing projects :
Most used EuroSciVoc Concepts for indexing projects
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#> |
💡This one an ideal one to generate a word cloud maybe ?
What if we send the CSV data to some nice online word cloud generator then ?
(OMG they also have a shooting star shape 🌠 in there 🤩)
As a conclusion…
According to Science (CORDIS saying !), New Year’s resolutions appear difficult to be held… because most of time too ambitious, restrictive or unprecisely formulated : indeed, « the effectiveness of resolutions depends on how they are framed. »
Horizon 2024, let’s suggest a(n RDF ?) well-framed one : may CORDIS SPARQL endpoint initiative be an example for other structures who want to share Linked Open Data !
Wishing you Best Interoperability and a Very Merry ✨ Sparqling New Year ! ✨
Next Post: 2013-2023 : ‘Tis SKOSPlay!’s Birthday !
Previous Post: Sparnatural : say it with SHACL !
These are some very nice queries!
But I have some bugs to post to OPOCE regarding their RDF representation, do you know if they have a github project?
Thank you Vladimir for your comment !
I’ve seen there is a contact address on CORDIS website, maybe you can start there ? It’s the same address as mentioned on EU Vocabularies website for CORDIS assets. Otherwise they seem pretty reactive via social media (try X : https://twitter.com/CORDIS_EU or Mastodon : https://mastodon.social/@CORDIS_EU@respublicae.eu).
Anyway a good thing to suggest to the team