"Dr Spock's Baby Care is a best-selling owner's manual for the most complicated 'product' imaginable -- and it only has two levels of headings. You people have 8 levels of hierarchy and I haven't even stopped counting yet. No wonder you think it's complicated.” - Wayback Machine
Data models, ontologies, thesauri and vocabularies: all attemps to systematically arrange the knowledge of our world. Systems that are the base for taxation and to determine ownership by registries or census. The importance of these systems increases exponentially now that databases are opened, shared and linked.
Good models contain at least two qualities: first, they are always more complex than you would expect as a layman, caused by exceptions that may seem trivial until you recognize that they are fundamental for the operation of the model. Second, they are often of great beauty, because they style a part of the world in a systematic way. Nice examples are the map of all buildings in The Netherlands, topographical maps, international students and of course the railway maps of the Dutch Railways or those from Japan.
The data used to make these maps are accurately studied models. An example: the Dutch Registry office makes a strict distinction between 'buildings' and 'residential objects', where a flat is ‘building', in which the appartments form the separate ‘residential objects'. This systematics are laid down in a special document of 100 pages with compelling examples (like a garage under construction attached to a house is not a building, but a row of garages together is). Complex matter, and very specific for the Dutch real estate situation.
So, what would someone who never visited The Netherlands understand from the map of all buildings? Many visitors of the online map noted obvious mistakes in the year of construction of buildings. Buildings are sometimes all dated ‘1005’, for instance in the centre of Amsterdam. In contradiction to all other dates, this date does not refer to an exact year: it means something like "long ago, we don't know exactly and it is not so important". It is not a mistake in the database of the Registry, or due a data model that is not designed to handle unknown or approximate dates.
For someone who knows our country and is a keen and interested observer, it is immediately clear that something must be wrong here. But for a Chinese visitor, this sudden explosion of medieval building activities in just one year is maybe less improbable.
The idea behind Linked Open Data is that if data are published in context, computers can automatically create links between those data. The Dutch Registry is measuring buildings in square meters, but the British Land Registry might use square feet. If every database is precisely described, this will not be an issue: computers will convert the surface areas and thus be able to compare Dutch and British houses.
But that is all metrics. To determine social, historical, linguistic and cultural contexts in a data model is notoriously complicated or impossible. A data model is practical and meaningful just because it is limited to a certain domain and a limited context, to make it in the best way usable for a specific target.
The UK is one of the few countries in the world that actually does not have a registry. This may seem surprising, but he whole idea of describing a property in three dimensions is non-existing there. Instead, they have a four-dimensional system of rights: one may have the right to cultivate a certain piece of land, while someone else might have the right to hunt on that same piece of land.
It will be fascinating to follow how the semantic web - in which all data will be always be linked in realtime - will deal with these aspects and what insights and misunderstandings this may deliver. The art will become to link data in a model that is easy to understand and thus will remain controlable.
At Waag we are trying to keep the manual for the objects in this world as simple as possible. For this reason the CitySDK API has a map viewer, to show you all data available and this is also why it works with JSON - data in JSON format are still readable for laymen or at least somehow comprehensible.