20 février 2020

Semantic Markdown Specifications

Markdown (MD) has become the de facto standard syntax for writing on the web, pushed by Github and StackOverflow. It is heavily used everytime one need to enter a comment, or write a simple (document-style) HTML page. What if we could embed semantic annotations in a markdown document ? We would get Semantic Markdown ! imagine the best of both worlds between human readable/writable documents and machine-readable/writable (RDF) structured data. We could feed an RDF knowledge graph that is coupled with our set of MD documents, and we would have an easy way to put structure in content.

I see a lot of potential in this, and already see some use-cases. Unfortunately I don’t have the bandwith, nor the full skills to make this happens. So I am just writing this in the hope that the idea is implemented by someone, or that someone tells me it is totally nonsense…

Here are the semantic annotations use-cases I see with such a Semantic Markdown :

Annotate a span or title that corresponds to an entity ;
Annotate a piece of text with an existing URI for an entity;
Create some statements on an entity;

Note that I am not necessarily looking for a way to produce RDFa annotations on the generated HTML, although that would be nice for a schema.org use-case. Any conversion route from the original semantically annotated markdown to a set of triples would be fine.

My source of inspiration is essentially Span Inline Attribute Lists » from the Kramdown syntax.

Annotate a span that corresponds to an entity

This piece of Semantic Markdown :

Tomorrow I am travelling to _Berlin_ {.schema:Place}

When interprered by a Semantic Markdown parser would produce this set of triples :

_:1 a <http://schema.org/Place> .
_:1 rdfs:label “Berlin” .

The span immediately preceding the « {.xxxx} » annotation is taken as the label of the entity. The use of rdfs:label to store the label of the entity could be subject to a parser configuration option.

One could imagine that a semantic markdown parser relies on the same RDFa Initial Context to interpret the « schema: » prefix without further declaration. But what about other ontologies ? we would need some kind of prefixes / vocab declaration somewhere in the document, just like in RDFa.

Note also that Markdown parser supporting the « {.xxxxx} » syntax will also insert this value as a CSS class on the corresponding span, so we win both on the CSS level and the semantic level.

Annotate a title

Similarly, we could annotate a title

### European Semantic Web Conference {.schema:Event}
Lorem ipsum...

In that case, the full content of the title is interpreted as the label of the entity :

_:1 a <http://schema.org/Event> .
_:1 rdfs:label “European Semantic Web Conference” .

Annotate with a known URI

Tomorrow I am travelling to [Berlin](https://www.wikidata.org/wiki/Q64) {.schema:Place}

Would yield

<https://www.wikidata.org/wiki/Q64> a <http://schema.org/Place> .
<https://www.wikidata.org/wiki/Q64> rdfs:label “Berlin” .

Describe an entity

If a list follows an annotated entity, then it should be interpreted as a set of predicates with this entity as subject :

### Specifications Meeting {.schema:Event}

* Date : _11/10_{.schema:startDate}
* Place {.schema:location} : Our office, Street name, 75014 Paris
* Meeting participants : 
  {.schema:attendee}
  * Thomas Francart{.schema:Person}
  * [Someone else](https://www.wikidata.org/wiki/Q80)
  * Tim Foo
* Description : Some information not annotated

### titre suivant
Lorem ipsum...

Should yield :

_:1 a <http://schema.org/Event> .
_:1 rdfs:label “Specifications Meeting” .
_:1 <http://schema.org/startDate> "11/10" .
_:1 <http://schema.org/location> "Our office, Street name, 75014 Paris" .
_:1 <http://schema.org/attendee> _:2 , <https://www.wikidata.org/wiki/Q80>, _:3 .

# attendee that is annotated : we know a type and a name
_:2 a <http://schema.org/Person>
_:2 rdfs:label “Thomas Francart” .

# attendee that is annotated with a URI : we keep the URI and add a label to it (?)
<https://www.wikidata.org/wiki/Q80> rdfs:label "Someone else" .

# attendee that is not annotated - but we know he was an attendee
_:3 rdfs:label "Tim Foo" .

If a list follows a title or a paragraph that contains an annotated entity…
Then items in this list correspond to a property of this entity…
And can be annotated with a property
The property annotation can be placed on an inline text, or right before or after a `:` or `=` character
If the property annotation immediatly precedes a list, then all items in this list would be considered values for that property, and in that case could be either : entities annotated with a type, or entities identified by a URI, or entites not annotated (and in that case we would consider them as blank nodes with only a label

Related works

Metadata for Markdown, a Python extension to generated JSON-LD from YAML section in a Markdown document.

EDIT : PanDoc divs and spans : https://pandoc.org/MANUAL.html#divs-and-spans

I like the <span> syntax :

[This is *some text*]{.class key="val"}

This is close ! but still would not produce triples, unless one writes explicitely RDFa :

My name is [Thomas Francart]{typeof="schema:Person"}

Post Tagged: json-ld, markdown, RDF, rdfa, schema.org

Next Post: SPARNAtural : écrire des requêtes SPARQL, tout naturellement

Previous Post: SHACL Play! free online SHACL validator for RDF data

There are 15 comments for this article

Vernay Bruno 11 avril 2020 at 10 h 15 min

At some point, XML is more readable …
I am not sure that Markdown is the right tool for this job

Reply to this message
- Thomas Francart Author 13 avril 2020 at 8 h 49 min
  
  Hello
  Do you really think XML is more readable _for a human_ than Markdown ? Do you really think a human author can _write_ XML more easily than Markdown ?
  The use-case here is editing of a document by a human author that would embed structured/semantic annotation in the document while authoring. Markdown is maybe not perfect but is I think the closest that exists now for this job.
  Thomas
  
  Reply to this message
  - Vernay Bruno 13 avril 2020 at 10 h 53 min
    
    Don’t get me wrong. I am a huge fan of Markdown and Asciidoc. I use them often and it is great and simple.
    I like the simplicity of the syntax but if I have to type « _11/10_{.schema:startDate} » or « [Thomas Francart]{typeof= »schema:Person »} »
    I lose this simplicity and almost regret the consistency of XML. But I would never go back to XML to write text, that is for sure.
    Maybe tools will provides enough help … we will see
    
    Reply to this message
    - Thomas Francart Author 14 avril 2020 at 15 h 38 min
      
      Thanks – the whole point here is to find balance between syntax simplicity (with the assumption however that the writer is acustomed to structured data/annotation – otherwise we fallback to text-mining solutions to interpret the sentences written in plain text), and coverage of the use-cases described : annotating with an entity type, an entity URI, or describing an entity.
      Do note that the proposed « semantic extensions » are based/inspired by other existing markdown extensions.
      Please do suggest alternative syntaxes that you think would be more appropriate for a writer if you have any suggestions.
      
      Reply to this message
      - Vernay Bruno 14 avril 2020 at 19 h 40 min
        
        I would have keep the same syntax as links [Berlin](Place) and [10/11](Date)
        That maybe too simple.
        And certainly if there is a link already, that would complicate obviously [Berlin](Place https://…/) as long as there is only a space in between it would be OK.
        
        Also using some kind of « Front-Matter » to define the Schema used would be necessary I guess.
        
        But that is just a random opinion, I am not involved in any of this. Thanks a lot for the attention you gave me already.
- Björn Sackemark 30 mars 2021 at 21 h 23 min
  
  Agreed! Making a simple concept complex again is backwards. Gruber created Markdown to simplify the writing of HTML. It was intended to be an authoring language (which also is highly human-readable). A simple syntax for writing in plain text, which scripts in all languages can convert to HTML.
  
  Semantic metadata is useful, but again, complexity increases for each additional feature.
  
  I agree re. XML. Isn’t schemas already applicable to XML besides being designed for HTML?
  
  Alternatively, you could be using JSON — for those who prefer C syntax and curly brackets over .…
  
  (Or YAML; really the HTML to JSON’s XML…)
  Don’t get me wrong. I applaud the effort here. I’ve been writing and thinking in Markdown for a decade or so.
  
  Schemas—which are standardizing how products, news articles, people and more, are defined in the markup on the web—those are exciting too.
  
  (As the FAANG companies have shown us, metadata is obviously useful and can be used for a lot of stuff.)
  
  But remember DRY and especially KISS. Keeping things simple is important.
  
  Don’t complicate solutions creates to simplify things. I can’t imagine how you’d be successful in doing that.
  
  Reply to this message
Luis Pozo 12 avril 2020 at 21 h 35 min

This post is very interesting, today I was wondering how could work a Blog based on a Knowledge Graph. The basic structure was simple just link the posts using some kind of topic or tag but then I realized that if I could implement something like semantic Mark down to add another layer of knowledge to the graph it will be pretty cool and suddenly I find this, a basic structure of how should works SeMD.
I don’t know if I will able to implement it, but I’ll let you know if I do something similar

Reply to this message
- Thomas Francart Author 13 avril 2020 at 8 h 46 min
  
  I’d be very interested to know if you do something similar; please let me know. I think the most promising thing is the work in Pandoc : https://pandoc.org/MANUAL.html#divs-and-spans
  Cheers
  
  Reply to this message
Jonas Smedegaard 13 avril 2020 at 11 h 46 min

Great post.

Yes, I also think that a good first step would be to implement this idea in Pandoc. More specifically as a Pandoc filter: https://pandoc.org/filters.html

Reply to this message
more urgent jest 15 mai 2020 at 8 h 50 min

you might find this interesting:

https://mbakeranalecta.github.io/sam/index.html

Reply to this message
more urgent jest 15 mai 2020 at 8 h 54 min

i think you might find this interesting:

[SAM](https://mbakeranalecta.github.io/sam/index.html)

Reply to this message
Chris McGee 28 mai 2020 at 15 h 36 min

> One could imagine that a semantic markdown parser relies on the same RDFa > Initial Context to interpret the « schema: » prefix without further declaration. But what about other ontologies ? we would need some kind of prefixes / vocab declaration somewhere in the document, just like in RDFa.

First, thank you for writing this up. I’m very excited at the possibilities of merging semantic web concepts with Markdown as a format that is both human readable (for the most part) and machine parseable.

One thing I value about Markdown is that everything is in plain sight, even if you have to scroll through the document. I’ve been trying to keep my markdown more readable using a technique for dealing with URL repetition. Here’s a quick example of a « footnote » in markdown.

Lorem ipsum [dolor sit] … Here we see the famous words « [dolor sit] » …

… rest of the document …

—
[dolor sit]: http://somewebsite.org/dolor_sit.html

I wonder if this mechanism could be applied to your schema examples to keep the individual semantic references short, but allowing someone not familiar with a specific schema to look it up. Meanwhile, if you are already familiar it just sits at the bottom of the document out of the way and repeated only once.

### Specifications Meeting {schema:Event}
* Date : _11/10_{schema:startDate}

… the rest of the document …

—
[schema]: http://schema.org
[rdfs]: http://www.w3.org/2000/01/rdf-schema#
… other schemas used in this document …

Reply to this message
Ivo 19 mars 2021 at 15 h 26 min

Yes, that’s much needed.
The applications are plenty.
Now the rise of tools such as RoamResearch, Obsidian, RenNote, Athens, and Logseq, not to name a few, amplifies the number of uses cases.

Reply to this message
Niko 24 novembre 2021 at 0 h 55 min

You can have a look at SAM, which specifies Semantic Authoring Markup language
https://mbakeranalecta.github.io/sam/quickstart.html

Unfortunately it does not build up on MD but redefines all the formatting syntax, together with a basic templating syntax « a la moustache » (conditions, variables).

It is still interesting, specially the part about annotations and context, that are very similar to predicate and ontological context in RDF

Reply to this message
Niko 24 novembre 2021 at 1 h 02 min

There is also this very simple RDFa syntax ported to MD:
https://github.com/sbmsuite/roundpin/wiki/Semantic-Markdown

Reply to this message

Annotate a span that corresponds to an entity

Annotate a title

Annotate with a known URI

Describe an entity

Related works

Partager :

Répondre à Vernay Bruno Cancel comment reply