Blogging was where we began, and how we built our company so we have preserved this archive to show how our thinking developed over a decade of developing the use of social technology inside organisations

Two gurus vs the Semantic Web

by

Matt Jones | work & thoughts | Lou on Shirky is a short piece that links to two broadly anti-Semantic Web viewpoints from Information architecture guru Lou Rosenfeld and Social software guru Clay Shirky. Both essentially believe that the costs of trying to gather and maintain meaningful metadata outweigh any potential benefits, which means that the semantic Web will remain a pipedream of developers
This is an interesting debate, and one which will never really be resolved either way until we can see and use Semantic Web applications and evaluate their effectiveness
Matt Jones sits on the fence, saying

“I’m thinking that while, like Lou, I’ve heard a lot of the same evangelising of metadata, (leading, almost inevitably to ‘boil the oceans’ type-projects); there is a lot of R.S.M.M. going on in the background of the current crop of personal and group web tools which means that for a set of problems and markets, that exciting ‘sematic-web’ like stuff will get built, and will prove useful to end-users and affordable to achieve for clients/companies.”

For me, the biggest flaw in Clay’s argument as echoed by Lou is simply that he assumes people involved in developing Semantic Web applications are seeking to create a single overarching ontology or schema that can link together all forms of knowledge. Clearly this is impossible. Clay also suggests that the Semantic Web’s approach to ontology is based on syllogisms (i.e. “If A=B, and B=C, then A=C”), and he goes on to beat the Semantic Web with this particular stick in a rather entertaining way
However, for most people, the kinds of benefits that the semantic Web could bring are a lot simpler. In the real world we have different languages with different semantic structures and even alphabets; we are quite comfortable with the idea of creating mappings (translations) between different languages, and this is best done with some human intervention rather than by machines. Why can we not seeek to do the same with specific knowledge domains or even personal worldviews / perspectives
Let’s say that I have a weblog category about “Turkey” WHERE Turkey is a country not a bird, and straddles the Middle East and Europe. If I can expose this basic ontological data then people with weblog categories about the Ottoman Empire, the Middle East or the football team Besiktas can see that my blog is relevant to them, and weblogs about Christmas recipes will know that I am not interested in their particular Turkey-related content
If you start to think in terms of bottom-up, “emergent” linkages between people, their content and their perspectives, and you are not seeking some ontological silver bullet that infers meaning without human intervention, then the more modest aims of the Semantic Web seem entirely plausible; but the cost/benefit profile of creating and managing metadata remains an issue.

One Response to Two gurus vs the Semantic Web

  1. By William Hoami on March 30, 2005 at 4:45 pm

    What I think is that the “semantic web” is sort of like the Fourier transform of the “searchable web” — they are “inverse solutions of the same problem” in some sense — what can be solved in one domain can always be solved in the other — but at what cost?
    There are some occasions when pre-categorizing content via associated metadata for retrieval via hierarchical ontologies/taxonomies makes sense (generally when the content and metadata are relatively static) — and other occasions where searching content with or without associated metadata makes sense (generally when the content and metadata are relatively dynamic). Some kind of metadata is almost always needed in the case of “non-plaintextual” information — unless one wishes to create huge numbers of complicated “parsers” that can find the “hidden metadata” in (for example) .wav and .mpg files.
    The key difference, however, is that building the ontologies and the taxonomies and the other semantic web infrastructure, and MAINTAINING them all over time (especially when the content and metadata are relatively dynamic), is a Herculean, even Sisyphusian, task that it many orders of magnitude more difficult and COSTLY than refining a handful of ever more intelligent search engines that can work on content alone and which also allow for the possibility, rather than forcing the necessity, of associated metadata to aid the search.
    Every OS file system in the world could be modified to allow for “hidden but maintainable” plaintext/XML metadata files, associated with every other kind of file, named just like the corresponding file but pre/suf-fixed with .meta (or some similar file system convention) — with search engines modified to be aware, and take advantage, of their presence. OS file systems already support “file attributes” (metadata!) in their cataloging/directory mechanisms — easy enough to extend them to support additional metadata for searching purposes.
    One “people issue” is that some folks will never be able to construct an effective simple or complex Boolean search expression to save their lives and feel more comfortable doing the bulk of their “searches” according to pre-defined hierarchical ontologies/taxonomies (e.g. Yahoo!) no matter how many clicks it takes for them to arrive at where they want to go — whereas as other folks are able and prefer to reach their information destinations the “Google way” i.e. think, formulate search expression, click, narrow if necessary, got it!.
    But that Yahoo! vs. Google example also illustrates the maintenance issue — how many dozens/hundreds/thousands? of Yahoo! (and other!) content/metadata/site maintainers to handle the semantic web requirements of the entire planet vs. a relative handful of Google search engine developers to handle the searching requirements of the entire planet?
    Continually improved searching mechanisms is by far the most cost effective approach; “semantic web” is a pipedream created by ivory tower academicians to provide a new “silver bullet” for Information Management and to provide employment for themselves and many thousands of consultants, developers, content and metadata maintainers, library science professionals (taxonomies, ontologies), etc. Simply put, it is just another “dreamware” product designed to try to draw additional employment and revenue into the IT industry as opposed to providing a genuinely cost effective approach to managing/sifting through “information overload” – which of course was enabled, if not created, by the IT industry in the first place.