by Lee Bryant

This is a Headshift blog post by Lee Bryant, written on February 18, 2005, and tagged as , , , , . It has (1) comments, the latest of which was on February 18, 2005.

Can robot learning teach us how to share emergent metadata?

This week saw the launch of the Augmented Human Interaction Lab at Queen Mary College, University of London. To mark the occasion, Professor Luc Steels of the Sony CSL, Paris and University of Brussels VUB AI Laboratory delivered a lecture entitled "The Origins of Language and Meaning," to which we were kindly invited by Dr Pat Healey.

It was a fascinating lecture by a very engaging speaker who is clearly an expert in this interesting area. We have spent some time recently thinking about the limitations of currently popular social tagging and folksonomy techniques applied to emergent metadata, and looking for ways to limit the domain scope of individual tags and terms without losing the social sharing aspect that makes them so interesting in the first place. We are also striving for a lightweight markup approach that can allow individual groups to links tags in a namespace to produce rudimentary ontologies using simple discrimination techniques over multiple iterations, for example by asking "do you mean Blue the colour or Blue the mood?" when adding tags to shared objects. Consequently, I was intrigued by the possibility of agent-based negotiation of terms and coordination of categories between populations and data sets. This has triggered some ideas that I will write up separately, but rather than pollute Professor Steels ideas with my own ramblings, I thought it best to write up my own brief notes about his lecture.

Professor Steels began by showing some videos of state of the art robotics from the Sony Computer Science Lab in Paris, demonstrating complex functions such as external adaptive response, real time 3D camera vision and uneven terrain control, plus what he called the Turing test for dogs (video), which marked the robot AIBO's first attack by a real dog in April 2000.

All very entertaining stuff (the BBC has more about how robots learn to walk today), but as Professor Steels pointed out, the next step is to recreate the real intelligence required for communication and language, which is a more substantial challenge.

Communication requires us to:

  1. influence joint attention
  2. use representations, such as conceptualisation and expression
  3. create conventionalised communications to share these conceptualisations and forms of expression

For example, Professor Steels mentions the example of young children, who are often able to create new visual representations. but sometimes without shared conventionalisations, which means that adults cannot understand their scribbles. We need both the representations and the shared context of conventionalisations in order to communicate.

To explore this theme, he gave an account of the 1999 Talking Heads experiment that explored how agents negotiate shared meaning from a starting point of very little shared understanding (publications are available here). This involved two cameras pointing at the same white-board with coloured objects (shapes) that are used as a common reference point to negotiate meaning, as described on the experiment's web site:

The Talking Heads are robotic agents running on a computer, looking at the world through digital cameras. Together, they play language games.

One agent looks at a scene, selects an object in the scene, picks some distinctive characteristics of that object (relative to other objects in the same scene) and says a word that represents one or more of those characteristics.

The other agent tries to interpret this word in order to guess what is meant by the speaker.

Over the course of a series of games, the Talking Heads build a shared lexicon to talk about the objects in the scene.

One of the goals of this research was to play with idea of robots negotiating shared meaning and inventing their own ontologies to describe shared scenes. In order to show just how complex this is in the real world, Professor Steels commented on a short video of two AIBO dogs watching a rolling ball together - the complexity of the sensory data and relative negotiation of space and state required to do this is quite daunting.

So, how can agents arrive at shared concepts and invent shared conventions? In this case, the experimenters began with no shared ontology, requiring agents to invent words to make links between their own lexicons through selective expression - pointing - and interpretation (essentially a kind of guessing game). The team later extended the experiment through links to Tokyo, Brussels, London and Cambridge, allowing 3,000 software agents to "travel" across the network (teleport) to play elsewhere, and further shook things up by allowing anybody to change the content of the white-boards.

At its simplest level, the basic requirements for ontological negotiation are:

  1. shared situaton
  2. shared perception: physical -> neural brain states
  3. shared cognition: categorisation + conceptualisation of reality
  4. shared language: lexical + grammatical conventions

But what is a shared situation? And do we really have shared cognition? Do we really categorise things in the same way? Evidence suggests there is more variance in these factors than we perhaps acknowledge. For example, we all see colours differently - some 30% of women are estimated to "see" with four basic colours rather than the usual three (Red, Green & Blue) - and individual cultures often categorise colours in very different ways (e.g. what is green vs blue).

When somebody says "this square is brown," the process of ontological negotiation begins with the speaker...

perception -> categorisation -> naming (consulting lexicon) -> rendering -> say "This is Brown"

... and the hearer reverses the process to establish a shared understanding:

decode "This is Brown" -> look up "brown" -> categorisation -> perception of the square.

The nativist view of the origins of language (Pinker, Chomsky, et al) says that our common physiological makeup, genetic constraints on our language organ and certain innate concepts mean that new members of our group are genetically constrained in their development of language, resulting in common characteristics. The empiricist view holds that new members simply adopt the existing dominant system of language because they have the same physiological apparatus, live in the same ecology and adopt the same inductive learning mechanisms as their predecessors. Both theories depend upon the existence of a 'system' that is either learned or innate, but what if there is such variance in our perception that there is no fixed 'system'?

Relativistic cognitive science instead believes "the" system does not exist and focuses on how we create new representations (semiosis) and align/coordinate our representations and categorisation. In this model, the semiotic system is a complex adaptive system that is constantly re-shaped to maximise communicative success and shared understanding, based on non-arbitrary, negotiated conventions.

Professor Steels recounted an experiment to explore the evolution of such a complex adaptive system in a language game between agents to develop bi-directional associative memory by negotiating shared naming of object categories. The rules were very simple:

  1. invention: If no word exists invent it (speaker)
  2. adoption: if hear a new name then store it (hearer)
  3. coordination: (speaker + hearer): success leads to enforcement and inhibition (promote the winner and demote near competitors); failure leads to a damping effect on the word.

When this game was played by 10 agents with 10 objects across 2000 iterations, the resulting graph shows a rapid rise in success and then a levelling out as the number of names in use tends towards the number of objects they describe (an optimum system is 1:1); and this state occurs surprisingly quickly. A simpler ruleset involving just adoption also shows the same levelling out, but with a larger (non-optimal) vocabulary, and a ruleset based on just enforcement and inhibition also stabilises but not at a 1:1 ratio. The combination of adoption and coordination is the only ruleset that consistently produces an optimal negotiation of shared meaning. Even with the addition of a small degree of stochasticity (random variation) and allowing new agents to enter the game with new words every 100 turns, the results are quite stable.

This suggests that language can self-organise. But what about shared categorisation? Agents may develop their own categorisation systems, but can they coordinate them with each other? A similar language game described by Professor Steels showed that allowing agents to use a version of the Radial Basis Function to pick a point and scan for the nearest best category example - and then say 'is this what you mean?' - generates a continuously shaped landscape that will begin to stabilise around clusters of shared categories after several iterations. This approach - discrimination -> shared expression -> repeat - is a simple approach that allows relative coordination of categories among software agents.

The graphs produced by this experiment were very interesting. First of all, one word often emerges as the dominant synonym for a given meaning, and then no newly invented words are able to challenge it. This curve looks very much like the classic power law implied by the evolution of a super-hub in a scale-free network. Clearly, this has many real-world analogues, such as the use of the brand name 'Hoover' to describe the object 'vacuum cleaner' in the C20th. Secondly, the incidence of polysemy varies over time (there are often high levels of polysemy early on, but also later spikes can emerge) as agents possess multiple words that seem to work for the same object or representation at a given time.

This area of semiotic dynamics (a subfield of dynamics) studies how semiotic relations change in time within a population. So far, this work has focused on the relatively simple level of naming objects and categories, but there are many higher levels of abstraction and complexity such as multiple words for multiple categories, grammar for objects and predicates, meta-grammar for systematicity, grammar for 2nd order predicates and language becoming it own meta language. It will be interesting to follow this work as it tackles ever more complex aspects of the evolution of language.

Questions I wanted to ask but was not brave enough to do so among so many Computer Science people included:

  • How would the addition of a degree of preferential attachment among the agents affect the process? In other words, if agents have a simple reputation system based on how reliable they are at identifying shared meaning (probably a function of their position in the network, assuming equal 'intelligence'), then some agents will begin to control the evolution of language. What might this teach us about analagous real-world situations, such as the influence of major news agencies in the use of language to describe current events? For example, it would be interesting to trace the use of terms such as 'terrorists' vs 'insurgents' by the BBC, CNN, etc., and analyse patterns of usage over time.
  • Does the research help us identify the evolution of hub words? For example, research at the Santa Fe Institute identified small world network properties in the English language, and found that some 'hub words' are super-connected and become reference points for many new words that emerge in the language. Do we expect the same thing to happen in the negotiation of language between agents, and what can we learn from this?
  • Can we use this to negotiate shared meaning between distinct category/object sets? I understand research of this nature is underway relating to personal music collections, where playlists and categories are often quite arbitrary. I wonder whether we can apply this to collections or clouds of tags that reference multiple web bookmarks, photos or other objects, using the discrimination -> shared expression -> repeat approach to enable agents to negotiate shared meaning between different users' or different groups' tags by referring to the objects they reference?

Inspirational stuff from Professor Steels; thanks to Pat for the invite and apologies to anybody concerned for any lack of comprehension implied by the above notes.

Subscribe to feed

related posts

top tags

1 Comments

user-pic

What an interesting post.
By coincidence I delivered a column on the subject of machine-mediated ontology building just yesterday.
Your piece, as always, shone light into some dark corners.
One of which was why my mother and I could never agree on beige versus grey.

Leave a comment