Blogging was where we began, and how we built our company so we have preserved this archive to show how our thinking developed over a decade of developing the use of social technology inside organisations

Preview of UK Government’s open data site

by

Last week I was invited to a preview of the new UK open data initiative announced by the Cabinet Office
The site – http://data.hmg.gov.uk – is still in a closed preview, but it looks good, is built on Drupal and uses some third party data management tools. It contains a list of 1123 data sources, powered by CKAN, of which 20 have been converted to a linked data format. It has taken three months to free up the data that forms part of the initial release, and much work has gone into data conversion from PDF and other non-malleable formats into CSV and linked data. Some sources have SPARQL endpoints that will allow direct queries to return RDF or JSON
Within the data sets are some that contain comprehensive lists of all farms, schools and other entities, so these could potentially be used as pivots for mapping other data and metadata. Others are built on edubase, which already contains 6m triples including performance data for schools and other educational establishments. Crucially, the data is open to all for both commercial and non-commercial use, in an attempt to stimulate innovation
See Harry Metcalfe’s summary for more details and Jeni Tennision for more on the data formats debate
Andrew Stott, the Director of Digital Engagement, has led the project, working with Richard Stirling, Lisa James and others, relying also on help from people working across government departments, such as Emma Mulqueeny
The project has four objectives

  • Transparency and accountability
  • Empower citizens to drive public service reform
  • Unlocking social and economic value (Taxpayers paid for the data)
  • Help Britain research and technology have a leading role in the next generation Web

Another consideration will be to look at policy and standards issues in the public sector as a whole. For example, there are 400+ Local Authorities but no consistent standards for data. Andrew said they will also try to take the opportunity of contract renewals with third parties to give government full rights over the data produced by commissioned projects, but he wants to start involving people and get some feedback before they go much further

“We want to see what people can do with it to shake some trees in Whitehall to free up more [data]”

I was impressed both by the vision for the project and the way Andrew’s team have gone about realising it, and it was a nice touch to discuss their work face to face as well as just read the announcement. I would strongly urge developers to answer the call to get involved, and ideally to avoid obsessing about tiny details of data formats and licensing in favour of actually building something that is (a) useful and (b) communicable to normal people rather than just data geeks. It is easy to forget that what we are doing here is so utterly marginal in terms of both investment and impact that it barely registers on either government’s or citizens’ radar right now. But there is no doubt in my mind that this is where government should be headed if it is serious about engaging better, improving services and getting more value out of existing investment.

2 Responses to Preview of UK Government’s open data site

  1. By cyberdoyle on October 5, 2009 at 1:19 pm

    Hi Lee, great article, and helping to raise awareness…
    …( I found it via twitter).
    Very important to engage our developer community to contribute, geeks are finally getting the recognition they deserve. Meeting of two cultures and generations for the common good. Democracy rules. Well done to dirdigeng and team for all their hard work, and good luck to all who will help bring it to the public in an understandable, easy to access way.
    Power to the People.
    chris

  2. By Tom Morris on October 6, 2009 at 6:28 pm

    Don’t mean to nitpick but the SPARQL endpoints also return SPARQL XML Results Format – which is a very simple XML format. If you do a CONSTRUCT or DESCRIBE query you get back RDF (usually as XML). If you do SELECT or ASK you’ll get back SPARQL XML Results Format or JSON. Generally, you use CONSTRUCT and DESCRIBE to filter and then use SELECT to extract the data as rows.
    What’s interesting about this? Well, SPARQL XML Results Format is W3C specified and because it’s XML you can run XSLT over it. And some of the SemWeb community already have written XSLTs that convert the data into interesting formats like GeoRSS and KML, RSS/Atom, OPML, XHTML or whatever else you find interesting. You can basically write a query and then pass the results to an XSLT processor which turns them into some commonly used format. Basically, one can get to the point where one can prototype a mashup using nothing more than public SPARQL endpoints and public XSLT servers (the W3C has one – http://www.w3.org/2005/08/online_xslt/ ). For geodata, you do a SPARQL query and have the result translated into KML, load it into Google Maps or Google Earth and you’ve got yourself a prototype.