author picture

by Lee Bryant

This is a Headshift blog post by Lee Bryant, written on August 30, 2004. It has (3) comments, the latest of which was on December 23, 2008.

Can social tagging overcome barriers to content classification?

Asking users to classify content and generate metadata within online knowledge sharing systems can improve the findability of content, but it has two main problems areas:

  1. The taxonomy or metadata structure may be too rigid to support user needs
  2. The overheads of classification are borne by the user, but the group reaps the benefits

In The Cognitive Cost of Classification, Jess Mcmullin considers how faceted classification systems might help overcome the first problem, but may exacerbate the second. He goes to consider how simple, user-generated "social classification" can approach the problem from another angle.

This is the approach recently dubbed Folksonomy (cf taxonomy) that draws on the lessons of Furl, Flickr and Del.icio.us. The idea, in essence, is that people add free-text tags to their content and where people happen to use the same terms then their content is linked, with frequently used terms floating to the top to create a kind of positive feedback loop for popular tags.

John Udell recently picked up on how this promises to change the way we deal with knowledge sharing within the enterprise in an InfoWorld column entitled Collaborative knowledge gardening.

The obvious disadvantage of this approach is its lack of precision (synonym/antonym control, related terms, context, etc), but in most practical usage scenarios the trade-off between simplicity and precision makes sense.

Clay Shirky, who like many others is excited by the possibilities of this approach, points out that even with simple tags, some degree of higher-level organisation can be achieved:

"Lack of precision is a problem, though a function of user behavior, not the tags themselves. del.icio.us allows both heirarchical tags, of the weapon/lance form, as well as compounds, as with SocialSoftware. So the issue isn�t one of software but of user behavior. As David pointed out, users are becoming savvier about 2+ word searches, and I expect folksonomies to begin using tags as container categories or compounds with increasing frequency"

Whilst this is true, it neglects the fact that the main benefit of folksonomy in the real world will be to extract and link metadata from users who do not have the confidence or the inclination to apply anything more than a keyword - these users are unlikely to spoof hierarchical or compound categories from free-text tags. Considering it has taken a decade for users to start using 2+ terms in searches, I think we need to assume nothing more than the possibility of people entering simple keywords for the time being and focus our efforts on doing clever things with what little contextual metadata that accompanies them (time, place, person profile, context, etc).

At Headshift, we have been using this system for about six months in a live usage environment with a community of non-web-savvy users and it is strarting to produce enough data for us to reflect on how it is working (see for example slide 23 here from this my recent blogtalk case study - more detail here). We use it in conjunction with a limited set of formal, faceted classifications to cover non-contentious aspects of use (e.g. regional information and organisational programmes) in addition to group-defined weblog categories, which are something of a middle ground between the two approaches. I think this combination will prove to be overkill, and we will surely refactor at some stage, but I am not yet sure which works best for users, and in which context.

One thing's for sure: social tagging is a revelation for anybody who has sat through days of agonising taxonomy design with client organisations who are unsure of their users' real needs. It is an excellent illustration of the advantages offered by simple, emergent and iterative systems over old-skool top-down communications software.

3 Comments

user-pic

I still have to think more on folksonomy — I'm still reading George Lakoff's book, "Women, Fire, and Dangerous Things — What Categories Reveal about the Mind." But three things struck me initially about the utility of folksonomies:
1) "...and it is starting to produce enough data for us to reflect on how it is working..." For folksonomies to come into being, there has to be time and participation. When designing systems for immediate use, we don't often have such a luxury (especially if we don't have regular customer input on how they're thinking and what they're calling things).

2) Seems that a component only alluded to in some of what social taxonomies in its deficiencies is semantics and hierarchy — but I think that semantic web technologies, usage patterns, and natural hierarchies of concepts for terms selected could overcome some of the deficiencies where more precision is needed. With what we know, we can help alleviate the disadvantages of "lack of precision (synonym/antonym control, related terms, context, etc)..."

3) I've looked over the shoulder of a great number of people at their desktops, directory structures, and email-classification schemes; and my expectation that robust taxonomies will arise naturally out of the social soup is pretty low. Granted, I am not factoring in influence as a refining agent; but as Lakoff would say, refined taxonomies beyond the "basic-level categories" are "imagination" — they are conceptual. I do not believe that taxonomological refinement can happen spontaneously without some conceptual prodding (where refinement is needed, that is). I am assuming "robust," of course — and you referred to the tradeoff between precision and simplicity. I have to think through that one a bit more before I can agree or disagree that there need be a tradeoff at all.

I think that we know enough about semantics and dynamic classifications systems to devise technologies that would certainly play a strong role in maturing social taxonomies (the prodding part) instead of watching to see how far evolution can take us.

user-pic

Connectedy is a social bookmarking tool that for the most part solves the "overhead of classification" problem by harnessing the structure implicit in one's browser bookmarks to formulate categorical pages containing the user's links. Connectedy also serves its community by search-engine-optimizing the content and availing them of embedded search forms. Privacy is managed by users -- they may opt any of their content out of the community at a any time.

user-pic

It is always better to do tagging when you post.
Better more is to use phrases that only keybords.
Greetings and merry Xmas from italy

Leave a comment