I was recently looking into automatically sourcing event data directly from venue websites. What I found was stuff like this:
And I cried a little. What happened to semantic HTML?
Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in webpages rather than merely to define its presentation or look
We have all this semantically rich information (like cinema timings/locations for example) curated over countless man hours and it’s all just sitting on private database servers. The exposed representation, as you can see above, is not enriched, not semantic, and not really usable.
Ok, semantically-poor HTML is still usable — I just have to implement a generic date-parser, a natural-language processor, a domain specific location parser, and maybe I can throw in some neural networking too, huh?
All that effort developing software that can parse and make sense of data that should have been semantic to begin with. What a waste of time & effort.
I know this plea is fruitless though. People want their magical web with embedded tweets and Instagram photos. They couldn’t care less about semantic enrichment.
But we should care. We understand the tenets of the web; the mere fact that via HTTP You or A Server In Timbuktu can request a resource and get back a corresponding representation in HTML. That universality is squandered every time we write crap HTML or choose not to progressively enhance.
The concept of the semantic web has been around since 1999, when Tim Berners-Lee expressed his vision of a future web in which computers could understand the context of human speech and thought, to be able to “understand” our meaning when expressing ourselves.
It’s cool though — we’re just holding our breaths for an AI that can decipher the mess we’ve created, and when we get there maybe the enriched data will be publicly accessible and not in the hands of a monopolising party. And in the meantime, while we await salvation, we can create short-lived sugar in that new MVVM framework you heard about.
PS. Watch this: