CORDIS : a SPARQL endpoint is born !

CORDIS : a SPARQL endpoint is born !

Another star to light on EU’s linked open data maturity flag ! 🌟

Not talking about 2024 exceptional Northern Lights to come, but this one’s also good news for science !

➡️ Late 2023, the Publications Office of the European Union announced on social media the public release of the new CORDIS SPARQL endpoint.

CORDIS, aka « the Community Research and Development Information Service of the European Commission », is « the […] primary source of results from the projects funded by the EU’s framework programmes for research and innovation, from FP1 to Horizon Europe ». Described as a « rich and structured public repository with all project information held by the European Commission such as project factsheets, participants, reports, deliverables and links to open-access publications », the CORDIS catalog has also been made available in 6 European languages by Publications Office’s editorial team.

Cherry on top 🍒 of a whole process, the CORDIS SPARQL endpoint release comes to crown a long-term linked open data project. The aim identifying, acquiring, preserving and providing access to knowledge in a common will to share with the widest public possible a trust-worthy, qualified and structured information (see Publications Office 2021 Annual Management Report).

In the context of the pandemic (and recent opening of data.europa.eu, the official portal for European data, as defined in 2017–2025 European Open Data Space strategy), the EuroSciVoc taxonomy of fields of science was released April 2020, followed December 2021 by the publishing of European research information ontology (EURIO) on the EU Vocabularies website 🌐.

As presented at ENDORSE conference March 2021, the redesign of CORDIS data-model in accordance with Semantic Web standards contributed to bring the platform « from acting as a data repository to finally playing an active role as data provider », where EuroSciVoc taxonomy & EURIO ontology both played key roles in the creation of future CORDIS knowledge graph and SPARQL endpoint :

🔸 EuroSciVoc […] is a multilingual, SKOS-XL based taxonomy that represents all the main fields of science that were discovered from the CORDIS content, e.g., project abstracts. It was built starting from the hierarchy of the OECD’s Fields of R&D classification (FoRD) as root and extended through a semi-automatic process based on NLP techniques. It contains almost 1 000 categories in 6 languages (English, French, German, Italian, Polish and Spanish) and each category is enriched with relevant keywords extracted from the textual description of CORDIS projects. It is constantly evolving and is available on EU Vocabularies website […].

🔸 In order to transform CORDIS data into Linked Open Data, thus aligning with Semantic Web standards, best practices and tools in industry and public organizations, the need for an ontology emerged. CORDIS created the EURIO (European Research Information Ontology) based on data about research projects funded by the EU’s framework programmes for research and innovation. EURIO is aligned with EU ontologies such as DINGO and FRAPO and de facto standard ontologies such as schema.org and the Organization Ontology from W3C. It models projects, their results and actors such as people and organizations, and includes administrative information like funding schemes and grants.

👉 EURIO, which is available on EU Vocabularies website, was the starting point to develop a Knowledge Graph of CORDIS data that will be publicly available via a dedicated SPARQL endpoint. »

(Enrico Bignotti & Baya Remaoun, « EuroSciVoc taxonomy and EURIO ontology: CORDIS as (semantic) data provider  » , ENDORSE March 16, 2021. PDF VIDEO)

… A Knowledge graph that was soon released in 2022-2023 (see INDUSTRY TRACK 1 on Tuesday, 25 October of ISWC 2022 Conference for more detail), until final opening of a CORDIS SPARQL endpoint late november 2023.

Now fancy a few SPARQL queries in there ?

Follow the SPARQL 💫

CORDIS SPARQL endpoint is actually made available on CORDIS Datalab (and already referenced in EU Knowledge Graph among other European SPARQL endpoints ! see the query / see the results)

Here you can access a quick documentation guide to CORDIS Linked Open Data : https://cordis.europa.eu/about/sparql.

Let’s have a look at EURIO ontology first : we need to understand it to query CORDIS knowledge graph.

As we are told in the guide, the latest version can be downloaded on EU Vocabularies website. When we unzip the archive we access the whole documentation about EURIO Classes & properties that we need to write our SPARQL queries – and a diagram of main classes and properties of CORDIS data model : 

EURIO_v2.4

At first sight we can observe on the schema 3 main groups of entities :

  • On the top right, the projects & publications associated, key ressources of CORDIS ;
  • On the top left, the fundings & grants materials, on « monetary » side of the project ;
  • On the bottom, the organisations & persons implied, with references & coordinates.

Let’s open CORDIS SPARQL endpoint – some easy queries can be run to begin exploring CORDIS knowledge graph.

Nb : the data on SPARQL endpoint is a snapshot, but freshest dumps can be found on European data portal !

Here a simple one to find a list of FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme :

FundingSchemes with their titles and IDs corresponding to « Horizon 2020 » programme

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?fs ?title ?id
WHERE {
# select all funding schemes …
?fs a eurio:FundingScheme.
# … with their title …
?fs eurio:title ?title.
# … and identifier …
?fs eurio:identifier ?id.
# where the identifier contains the regular expression “H2020”
FILTER (REGEX (?id, ‘H2020′))
} LIMIT 100

▶️ See the results

The FILTER REGEX enables us to display the IDs corresponding to H2020 Funding Schemes.

We can make another query to get the projects with the Funding Scheme Programme they are related to (note that, in EURIO a eurio:hasFundingSchemeProgramme is a sub-property of eurio:fundingScheme) :

Projects with the Funding Scheme Programme they are related to

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
} LIMIT 100

▶️ See the results

(Here we used a property path with a « / » to shorten the query to get the acronyms of projects & Funding Scheme Programmes codes).

… and combining with the first query we can find the projects depending on H2020 Funding Scheme Programme in particular :

Projects depending on H2020 Funding Scheme Programme in particular

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?fundingscheme
WHERE {
# select the projects …
?project a eurio:Project.
# … with acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … with a filter on funding scheme codes ‘H2020′
FILTER REGEX (?fundingscheme, ‘H2020′)
} LIMIT 100

▶️ See the results

It is also possible to get the list of all existing Funding Scheme Programmes CORDIS projects have been funded by – we observe 27 of them here (from the SPARQL endpoint) – while adding a count function to know how many projects per FundingSchemeProgramme :

All existing Funding Scheme Programmes CORDIS projects have been funded by

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by funding scheme programme …
SELECT (COUNT (?project) as ?count) ?fundingscheme
WHERE {
# select the projects with corresponding funding scheme programmes codes …
?project eurio:isFundedBy/eurio:hasFundingSchemeProgramme/eurio:code ?fundingscheme.
# … counting projects per funding scheme programme
} GROUP BY ?fundingscheme
LIMIT 100

▶️ See the results

Querying the organisations properties will return other kind of useful informations about geographical location of the projects stakeholders. Let’s figure out we want to find the projects whose coordinating organisations have sites located in France :

Projects whose coordinating organisations have sites located in France 🐓

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?role ?organisation ?country
WHERE {
# select the projects with their acronyms …
?project a eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … and organisations with ‘coordinator’ role and name …
?project eurio:hasInvolvedParty ?organisationrole.
?organisationrole eurio:roleLabel ?role.
?organisationrole eurio:roleLabel « coordinator ».
?organisationrole eurio:isRoleOf/eurio:legalName ?organisation.
# … with address country for the sites defined at ‘FR’
?organisationrole eurio:isRoleOf/eurio:hasSite/eurio:hasAddress/eurio:addressCountry ?country.
VALUES ?country { ‘FR’ }
} LIMIT 100

▶️ See the results

Depending on available data, you can either query via PostalAddress info (eurio:addressCountry ‘FR’) or AdministrativeArea (eurio:hasGeographicalLocation) … Here we’re lucky as both fields are mandatory ones.

Last but not least, we can also play with CORDIS vocabularies : here you’ll have the choice to investigate via plain keywords of Projects or Publications items, querying titles, abstracts or other types of literals…

An example of projects with abstracts containing string ❄ ‘winter’ ❄ – the URL giving the exact link to the project online :

Looking for ❄ ‘winter’ ❄ in CORDIS projects abstracts (with nice URL to go)

PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT?project ?acronym ?abstract ?url
WHERE {
# select the projects with their acronyms and abstracts …
?project rdf:type eurio:Project.
?project eurio:hasAcronym/eurio:shortForm ?acronym.
?project eurio:abstract ?abstract.
# … with a filter on abstracts containing string ‘winter’ case insensitive …
FILTER (regex(str(?abstract), ‘winter’, ‘i’))
# … generating proper CORDIS website URLs based on RCN project code
?project eurio:rcn ?rcn.
BIND(IRI(CONCAT(‘https://cordis.europa.eu/project/rcn/’, ?rcn)) AS ?url)
} LIMIT 100

▶️ See the results

But funniest way will be using EuroSciVoc taxonomy (and navigating through thesaurus hierarchy) : to do so we need to navigate through property « eurio:hasEuroSciVocClassification » to get the Concepts skosxl:prefLabel property … to finally obtain the thesaurus labels (don’t forget to choose a prefered language with a FILTER (lang parameter) :

Projects with their associated EuroSciVoc keywords (English prefLabels 💂)

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?project ?acronym ?ESV
WHERE {
# select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … only returning ‘English’ prefLabels
FILTER (lang(?ESV) = ‘en’)
} LIMIT 100

▶️ See the results

A bit more complex one using first level of hierarchy of the taxonomy : here we are searching for all skos:broader concepts « with no other broader concept » (the FILTER NOT EXISTS formula), aka the top concepts or root concepts of the vocabulary used to describe the projects. Then counting the projects by each category :

All root categories of EuroSciVoc used to describe the projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list with corresponding projects …
?ESV_root ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
} GROUP BY ?ESV_root_label
LIMIT 100

▶️ See the results

… and maybe again more explicit results if refined to level 2 of hierarchy 👀 :

All ‘level 2′ root categories of EuroSciVoc used to describe the projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc level 2 top categories …
SELECT (COUNT(?project) AS ?nbProject) ?ESV_root_label ?ESV_level2_label
WHERE {
# … the top categories are Concepts …
?ESV_root a skos:Concept .
# … with no broader Concept …
FILTER NOT EXISTS { ?ESV_root skos:broader ?anything }
# … list level 2 category below level 1 with corresponding projects …
?ESV_root ^skos:broader ?ESV_level2 .
?ESV_level2 ^skos:broader*/^eurio:hasEuroSciVocClassification ?project .
# … and EuroSciVoc corresponding skos-xl prefLabels …
?ESV_root skosxl:prefLabel/skosxl:literalForm ?ESV_root_label.
?ESV_level2 skosxl:prefLabel/skosxl:literalForm ?ESV_level2_label.
# … sorting by EuroSciVoc category, with English prefLabels
FILTER (lang(?ESV_root_label) = ‘en’)
FILTER (lang(?ESV_level2_label) = ‘en’)
} GROUP BY ?ESV_root_label ?ESV_level2_label
ORDER BY ?ESV_root_label
LIMIT 100

▶️ See the results

And a little last one with a count, to enumerate most used EuroSciVoc Concepts for indexing projects :

Most used EuroSciVoc Concepts for indexing projects

PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurio: <http://data.europa.eu/s66#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# count the number of projects by EuroSciVoc Concept …
SELECT (COUNT (?project) as ?count) ?ESV
WHERE {
#  … select the projects with their acronyms …
?project eurio:hasAcronym/eurio:shortForm ?acronym.
# … with EuroSciVoc Classification prefLabels …
?project eurio:hasEuroSciVocClassification/skosxl:prefLabel/skosxl:literalForm ?ESV.
# … sorting by EuroSciVoc Concept, with English prefLabels
FILTER (lang(?ESV) = ‘en’)
} GROUP BY ?ESV
ORDER BY DESC(?count)
LIMIT 3000

▶️ See the results

💡This one an ideal one to generate a word cloud maybe ?

What if we send the CSV data to some nice online word cloud generator then ?

Cordis Taxo Cloud

(OMG they also have a shooting star shape 🌠 in there 🤩)

As a conclusion…

According to Science (CORDIS saying !), New Year’s resolutions appear difficult to be held… because most of time too ambitious, restrictive or unprecisely formulated : indeed, « the effectiveness of resolutions depends on how they are framed. »

Horizon 2024, let’s suggest a(n RDF ?) well-framed one : may CORDIS SPARQL endpoint initiative be an example for other structures who want to share Linked Open Data !

Wishing you Best Interoperability and a Very Merry ✨ Sparqling New Year ! ✨

Next Post:
There are 2 comments for this article
  1. Vladimir Alexiev at 17 h 25 min

    These are some very nice queries!
    But I have some bugs to post to OPOCE regarding their RDF representation, do you know if they have a github project?

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Vous pouvez utiliser ces balises et attributs HTML : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>