Feeds
Tech Semantic Web and Linked Data
(590 unread)
-
OpenLink Community Blog (88 unread)
-
Z-Blog
(242 unread) -
Nodalities
(47 unread) -
EN - Flux RSS - R & D (11 unread)
-
The Semantic Puzzle
(68 unread) -
OpenCalais - Official Blog
(6 unread) -
Chief Marketing Technologist (124 unread)
-
Semai
(1 unread) -
Reactive, autonomous (3 unread)
Tech Coding the Web and Software
(156 unread)
-
Software Cooperative News (69 unread)
-
Talance Friendly Web Tools Blog (87 unread)
Tech General News
(14447 unread)
-
Tech Eye - Latest technology headlines (4200 unread)
-
BBC News - Technology (4590 unread)
-
NYT > Technology (5657 unread)
Knowledge Man and Eng
(310 unread)
-
ISKO UK (56 unread)
-
KOnnect
(10 unread) -
CELSTEC Publications
(216 unread) -
Knowledge Engineering (19 unread)
-
Open Intelligence
(9 unread)
Friends
(311 unread)
-
VISION AFORETHOUGHT
(82 unread) -
Snell-Pym
(229 unread)
Newspapers
(27293 unread)
-
The Guardian World News (10891 unread)
-
The Independent - Frontpage RSS Feed
(16402 unread)
Politics UK and Ireland
(1186 unread)
-
Liberal Democrats RSS (481 unread)
-
Green Liberal Democrats News Stories
(100 unread) -
Liberal Democrat Christian Forum
(5 unread) -
Liberal Youth - Latest News
-
The Alliance Party of Northern Ireland News Stories
(528 unread) -
Home
(72 unread)
Politics EU and International
(825 unread)
-
European Movement UK (38 unread)
-
European Movement Ireland
(89 unread) -
OSCE press releases and media advisories (376 unread)
-
ALDE News
(39 unread) -
ELDR News
(283 unread) -
IFLRY News and Updates
Religion Christian
(2234 unread)
-
Church of England News (259 unread)
-
Latest News
(725 unread) -
Open Path
(19 unread) -
Affirming Liberalism
(11 unread) -
Greenbelt Blog (464 unread)
-
Fresh Expressions RSS feed (442 unread)
-
Emergent Village
(56 unread) -
Taizé (258 unread)
Religion Interfaith and Universalism
(496 unread)
-
Interfaith (230 unread)
-
IDC Interfaith Dialog Center
(139 unread) -
Inter-Religious Dialogue
(127 unread)
Nodalities
-
22:47 Linked Data and Libraries 2011 – Agenda Finalised
» Nodalities
The Linked Data and Libraries 2011 event to be held at the British Library in London on July 14th is to be opened with a Keynote from Dame Lynne Brindley, British Library Chief Executive.With reports from the LOD-LAM Summit, W3C Libraries Linked Data Working Group, plus an insight in to bibliographic linked data modelling intriguingly called The Record is Dead, this is looking like a not to miss event.
For full agenda and to register early to guarantee your place, check out the event site.
Lightening Talk slots available I am still taking submissions for the available lightening talks. Drop me a line before June 17th if you would like to propose a talk.
Image from a photo on Flickr by Fuzzyyol
-
16:54 Talis Linked Data Open Day – USA
» Nodalities
Whenever we publicise one of our Linked Data events we regularly hold in the UK, I always get a handful of responses wishing that we would run such an event on the American side of the planet.So guess where our next Open Day is going to be held – the photo might give you a clue.
Check if you are right, and find more details from our Consulting blog.
Photo Creative Commons licensed from Rob Styles Flickr Photostream
-
16:00 Talis Sponsor Pan-European Open Data Challenge
» Nodalities
We are proud to be a Lead Sponsor for the Open Data Challenge being coordinated by Jonathan Gray from the Open Knowledge Foundation and Paul Meller from the Open Forum Academy, under the auspices of the Share PSI initiative.This is a significant competition, with significant prizes totalling €20,000 for ideas, applications, visualisations and datasets – up to €5,000!
As you would expect from a Talis Sponsored competition, Linked Data features in the line up of attributes that entrants should be considering. Following the 5 Star Data principles espoused by Sir Tim Bereners-lee, the more machine readable, non-proprietary formatted, and linked that Open Data can be, the lower is the barrier to its innovative use. This is especially true in the area of Public Sector Information, with similar or associated data is being published by several organisations or governments. In recognition of this we are, as part of our sponsorship, backing the Talis Award for Linked Data – €1,000 presented for the best use of Linked Data in any of the competition categories.
The competition will run for 60 days, so get your ideas flowing, and developers fingers rattling over those keys.
Watch out for a later post, when I will identify some Linked Open Data that is already available that you could use to build an entry.
-
17:03 Linked Data and Libraries 2011 – July 14th
» Nodalities
After the great success of Linked Data and Libraries 2010 we are doing it again!Linked Data and Libraries 2011 will be held at The British Library in London on Thursday July 14th. Again it will be a free event, with limited spaces allocated, so register early.
The agenda is yet to be finalised, but as per 2010 it will be a mixture of general Linked Data overviews & experience, and library Linked Data speakers. We hope to hear from the British Library, W3C Library Linked Data Incubator Group, LOD-LAM Summit, and others. We are also hoping to find time for the 10 minute lightening talks slot, that worked so well last time.
Register early and/or if you would like to propose a topic or speaker, email me – richard.wallis@talis.com.
Image from a photo on Flickr by Fuzzyyol
-
15:02 Are We Getting A Right to Data?
» NodalitiesFriday night – nothing on the TV – I know! I’ll browse through the Protection of Freedoms Bill, currently passing through the UK Parliament. Sad I know, but interesting.
Lets scroll back in time a bit to November 19th 2010 and a government press conference introduced by a video from Prime Minister David Cameron. The headline story was about the publishing of government spending and contract data, but towards the end of this 109 second short he said the following:… the most exciting is a new right to data. Which will let people request streams of government information and use it for social or commercial purposes. Take all this together and we really can make this one of the most open, accountable and transparent governments there is. Let me end by saying this. You are going to have so much information about what we do, how much of your money we spend doing it, and what the outcome is. So use it, exploit it, hold us to account. Together we can set a great example of what a modern democracy aught to look like. (my emphasis)
Obviously to realise this Right to Data there needs to be some legislation, which brings me to the Protection of Freedoms Bill. This is one of those bills which covers all sorts of issues, from rules for destruction of fingerprints and DNA profiles, CCTV camera regulations, detention of terrorist suspects, to freedom of information and data protection. Zooming in on the bits on the topic of the release and publication of datasets held by public authorities, we find a set of clauses that amend the Freedom of Information Act 2000.
Re-use
After some amendments which allow for datasets and provision in electronic form we get this: “the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.” Unfortunately there is no definition of the term re-use. It could be argued that a pdf of some tables in a MS Word document could be re-used, where as I believe the spirit of the legislation should be made more explicit to by identifying non-proprietary data formats. I know this would be a tricky job for the parliamentary draftsmen, as we would not want to restrict it to things, such as XML and csv, that could age and be replaced by something better which then could not be used as it had not been mentioned in the legislation, but I believe that just using the term ‘re-use’ is far too woolly and open to [mis]interpretation.
What is [not] a dataset
This is one of the areas that raises most concern for me. Checkout this wording from the Bill:
I am OK with (a) – data collected as part of an authority doing it’s job – and (c) – don’t change the data you have collected – publishing that raw data is important. However (b) specifically excludes data that is the product of analysis. Presumably analysis of collected data is one significant way that an authority measures the outcomes of its efforts. Understanding that analysis will help understand the subsequent decisions and actions they make and take. I assume that there may be some specific reasons that underpin this blanket exclusion of analysis data. If there are, they should be identified, instead of generally throttling the output of useful data that will go a long way to helping with Mr Cameron’s stated ambition for us to be able to see “what the outcome is” of the spending of public money.Release of datasets for re-use
This is a whole new section (11A) to be added to the 2000 act to cover the release of datasets. It covers ownership, copyright, and/or database right of the information to be published and states that it should be published under “the licence specified by the Secretary of State in a code of practice issued under section 45”. Section 45 basically puts in to the hands of the Secretary of State the definition of the license(s) data should be published under. As of today the Open Government Licence for public sector information is what is wanted to keep the publishing of information open. However, what is there to stop a future Secretary of State, who has a less open outlook in replacing it with far more restrictive licences? Do we not need some form of presumption of openness being attached to the Secretary of States powers as part of this change in legislation?
On the topic of presumptions of openness, the wording of this bill contains phrases such as “unless the authority is satisfied that it is not appropriate for the dataset to be published” and “where reasonably practicable”. It is clear that many in the public sector are not as enthusiastic about publishing data as the current government position and such vague phrases as these may well be unreasonably used by some in justifying a throttling of the stream of information. They could easily be used to build in a bureaucratic decision hurdle for each dataset to have to jump, proving its appropriateness and practicality, before publication. I am sure that it would not be beyond a parliamentary draftsman’s skill to produce wording that means that all will be published, unless a specific objection is raised for an individual dataset, for reasons of excessive effort or data protection reasons.
Up-dated data
Data published by an authority should be published under a scheme, the following applies here:
How should we interpret “any up-dated version held by the authority of such a dataset”? My interpretation is that once a dataset has been published is shall continue to be published as it changes. The precedent for this is spending data – having published authority spending for January 2011, authorities should be automatically publishing it for February and following months. But what if, in response to a request, an authority publishes the contents of a spreadsheet used to track the amount of salt applied to roads in its area during winter 2010-11 and then uses a different spreadsheet for the following winter. Does the output of that new spreadsheet constitute a new dataset, or an up-date to it’s predecessor? From the wording in the Bill it is not clear.Who does it cover?
I probably need a bit of help here from those that understand the public sector better than I do, but I am suspicious that references to the organisations listed in Schedule 1 and “the wider public sector”, do not take the net wide enough to cover some of the data that is relevant to our daily lives but is delivered on behalf of some authorities by third parties. For example I am aware that recently a large city was not able to inform citizens of their rubbish collection schedules because that data was considered as commercially restricted by their service provider.
So in summary, I welcome the commitment to a right to data being realised by streams of government information about what we do, how much of our money is spend doing it, and what the outcomes are. However, I am sceptical as to how effective the measures in the current Protection of Freedoms Bill will be in delivering them. Especially in the light of very recent comments made by the Prime Minister highlighting the "enemies of enterprise" in Whitehall and town halls across the country, attacking what he called the "mad" bureaucracy that holds back entrepreneurs. Those enemies are just the people who might take the wording of this bill as ammunition in their cause.
Whilst being concerned about this topic, I have been wondering why few are commenting on it. Are the majority just taking the press conference statements by David Cameron, and his fellow Ministers, as indications of a battle won, or am I missing something? I promote Sir Tim Berners-Lee’s 5 Star Data as the steps towards a Web of Linked Data – if we don’t get the publishing of public sector data to at least 3 star standard (Available as machine-readable structured data – in non-proprietary format), many of the current ambitions may remain just that, ambitions. That would be a massive missed opportunity. So are we getting a right to data? – or just some provisions to extend the Freedom of Information Act a bit further in the dataset direction? I’m not sure.
Personal note: As you may tell from the above, I am no expert on the interpretation of parliamentary legislation, and I have left several unanswered questions hanging in this post. Any help in clarifying my thinking, confirming or disproving my assumptions, or answering some of those questions, will be gratefully received in comments to this post or your own posted thoughts.
-
13:09 Talis Group completes the sale of its Library division to Capita Group plc
» Nodalities3 March 2011, Birmingham, UK
Talis Information Limited, the library division of Talis Group Ltd, has been acquired by the UK’s leading outsourcing firm, Capita Group plc. The transaction is valued at £18.5m with an additional £2.5m due, based on performance over the next 12 months. Talis Information Ltd has a range of around 100 academic and public library clients based in the UK and employs 42 staff, all of whom are based in Birmingham, UK.
Talis Group’s other portfolio companies including Talis Education Ltd, Talis Systems Ltd and Talis Inc are unaffected by the acquisition of Talis Information Ltd. Talis Group’s other divisions provide a SaaS-based semantic web platform and related applications including Talis Aspire, a resource list management solution for higher education customers.
-
16:17 Linked Spending Data – How and Why Bother Pt3
» Nodalities
As often is the way, events have conspired to prevent me from producing this third and final part in this How & Why of Local Government Spending Data as soon as I wanted. So my apologies to those eagerly awaiting this latest.To quickly recap, in Part 1 I addressed issues around why pick on spending data as a start point for Linked Data in Local Government, and indeed why go for Linked Data at all. In Part 2, I used some of the excellent work that Stuart Harrison at Lichfield District Council has done in this area, as examples to demonstrate how you can publish spending data as Linked Data, for both human and programmatic consumption.
I am presuming that you are still with me on my basic assumptions “…publishing this [local government spending] data is a good thing” and “Publishing Local Authority data, such as local spending data, as ‘Linked Data’ is also a good thing”, plus the technique of using URIs to name things in a globally unique way (that also provides a link to more information) is not providing you with mental indigestion. So, I now want to move on to some of the issues that are causing debate in the community which come under the headings of ontologies identifiers.
Ontologies
An ontology, according to Wikipeda, is a formal representation of knowledge as a set of concepts within a domain - an ontology provides a shared vocabulary, which can be used to model a domain – that is, the type of objects and/or concepts that exist, and their properties and relations. So in our quest to publish spending data what ontology should we use? The Payments Ontology, with the accompanying guide to it’s application, is what is needed. Using it, it becomes possible to describe individual payments, or expenditure lines, and their relationship between the authority (payment:payer) the supplier (payment:payee) category (payment:expenditureCategory) etc. The next question is how do you identify the things that you are relating together using this ontology.
Lets take this one step at a time:
- Give the expenditure line, or individual payment, an identifier possibly generated by our accounts system. eg. 8605670.
- Make that identifier unique to our local authority by prefixing it with our internet domain name. eg. [spending.lichfielddc.gov.uk] – note the prefix of [’. ] This enables anyone wanting detail about this item to follow the link to our site to get the information.
- Associate a payer with the payment with an RDF statement (or triple) using the Payments Ontology:
http://spending.lichfielddc.gov.uk/spend/8605670
payment:payer
http://statistics.data.gov.uk/id/local-authority/41UD .Note I am using an identifier for the payer that is published by statistics.data.gov.uk. That is so that everyone else will unambiguously understand which authority is the one responsible for the payment.
- Follow the same approach for associating the payee [spending.lichfielddc.gov.uk]
payment:payee
http://spending.lichfielddc.gov.uk/supplier/bristow-sutor . - And then repeat the process for categorisation, payment value etc.
This immediately throws up a couple of questions, such as why use a locally defined identifier for the payee – surely there is an identifier I can use that other will recognise, such as company or VAT number! – there are, but as of the moment there are no established sets of URI identifiers for these. OpenCorporates.com are doing some excellent work in this area, but Companies House, the logical choice for publishing such identifiers, have yet to do so. Pragmatically it is probably a good idea to have a local identifier anyway and then associate it with another publicly recognised identifier:
http://spending.lichfielddc.gov.uk/supplier/bristow-sutor
owl:sameAs
http://opencorporates.com/companies/uk/01431688 .Identifiers
Because this is all very new and still emerging, we now find ourselves in a bit of a chicken-or-egg situation. I presume that most authorities have not built a mini spending website, like Lichfield District Council has, to serve up details when someone follows a link like this: [spending.lichfielddc.gov.uk] You could still use such an identifier using your authority domain, and plan to back it up later with a web service to provide more information later. Or you could let someone else, who takes a copy of your raw data, do it for you as OpenlyLocal might: [openlylocal.com] or maybe how the project we are working on with LGID might: [id.spending.esd.org.uk] If the open flexible world of Linked Data it doesn’t matter too much which domain an identifier is published from, or for that matter how many [related] identifiers are used for the same thing.
It does matter however, for those looking to the identifying URI for some idea of authority. As I say above, technically it doesn’t matter who’s domain the identifier comes from, but I believe it would be better overall if it came from the authority who’s payment it is identifying. Which puts us back in the chicken-or-egg situation as to resolving the URI to serve up more information. The joy of Linked Data is that, provided aggregators consider the possibility of being able to identify source authorities data accurately when they encode it, it should be possible to automatically retrofit links between URIs at a later date.
In summary over this series of posts we are seeing a technology which, although it has obvious benefits, is still early on the development curve; being applied to a process which is also new and scary for many. An ideal breading ground for cries of pain, assertions of ‘it doesn’t work’ or ‘not worth bothering’, yet with the potential to provide a powerful foundation for a future open, accessible, and beneficial to authorities, government, citizens, and UK Plc data rich environment. Yes it is worth bothering, just don’t expect benefits on day, or even month, one.
-
12:00 Linked Data: evolving the Web into a Global Data Space
» NodalitiesAs Linked Data becomes more established, a new book has been published that captures the state of the art and current
best practices in the field. Authored by Dr Tom Heath, lead researcher at Talis, and Professor Christian Bizer of the Freie Universität Berlin, “Linked Data: Evolving the Web into a Global Data Space” introduces the basic principles and rationale of Linked Data and provides detailed guidance for those exploring this emerging area of Web technology.Abstract:
The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards – the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes – the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a
resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.You can access a copy of Linked Data: Evolving the Web into a Global Data Space here.
-
14:02 Marketing the Semantic Web – Semantic Link podcast – Episode 3
» NodalitiesThe Semantic Link podcast panel are back with their third instalment of the podcast series. This month, two special guests: Krista Thomas, VP Marketing, Ad.ly (formerly VP Marketing, OpenCalais, Thomson Reuters) and
Scott Brinker, President and CTO at Ion Interactive, Inc. join the panel to discuss the complexities around marketing the semantic web.As you would expect, when marketing an intricate topic like the semantic web we are met with challenges. The panel discusses: who are marketing to? To those who will utilise the technology? Independent developers? Consumers? Or the social community? The use of terminology in the semantic web world is also explored amongst other key issues.
You can listen to this month’s podcast here and catch up on previous conversations too.

-
8:57 David Wood talks with Talis
» Nodalities
A short while ago, my colleague Zach Beauvais podcasted with the Vice President of Engineering at Talis Inc., David Wood. In this conversation, David discusses his background, Linked Data and SPARQL. He also talks about Talis Inc.s’ first US customer: the US Government Printing Office (GPO) and its Persistent URL infrastructure, which provides persistent Web addresses for critical government documents and is primarily used by the more than 1,200 Federal Depository Libraries. The PURL server uses the PURLz open source software, the development of which was led by David while at Zepheira, and complements the data hosting and search capabilities of the Talis Platform with identifier management functionality.For more information, you can follow David on Twitter on read his blog.
-
12:04 Talis Inc CEO, Bernadette Hyland speaks to The Semantic Link – Episode 1 and 2
» Nodalities
Each month, Talis Inc CEO, Bernadette Hyland participates in The Semantic Link podcast series amongst other Semantic thought leaders. Paul Miller, the host of the series, introduces episode one and the “PodPanel” which includes: Peter Brown, currently Chair of the Board of Directors of standards body OASIS; Christine Connors, a consultant specialising in ontology and taxonomy design and related issues; Eric Franzon, Vice President of WebMediaBrands; Ivan Herman, Semantic Web Activity Lead for the World Wide Web Consortium (W3C); Eric Hoffer, Consultant and Andraz Tori, CTO of Zemanta. The second instalment is also now available. -
11:56 Nodalities Issue 12 – now available
» NodalitiesIssue 12 of Nodalities is now available for download.

In this issue, some rather practical things that Linked Data is good at solving are being put to use saving lives: quite literally, as Bart van Leeuwen explains in our cover story. Simple ideas joining up public data and GIS devices are helping the Amsterdam fire service get their equipment to the scenes of fires more quickly and safely.
Elsewhere, Martin Belam, an information architect at the Guardian, tells us about their approach to Linked Data, and what it means to them. Talis’ Leigh Dodds outlines some of the challenge and opportunities of Linked Data in an evolving world in his article. Also supporting Linked Data research is the multi-organisational LATC Project which is introduced in this issue. And finally, Tim Hodson discusses a very practical approach to starting with Linked Data, and may also discuss eating an elephant.
You can subscribe to Nodalities for free here and read previous issues here.
-
16:39 Linked Spending Data – How and Why Bother Pt2
» Nodalities
I started the previous post in this mini-series with an assumption – ..working on the assumption that publishing this [local government spending] data is a good thing. That post attracted several comments, fortunately none challenging the assumption. So learning from that experience I am going to start with another assumption in this post. Publishing Local Authority data, such as local spending data, as ‘Linked Data’ is also a good thing. Those new to this mini-series, check back to the previous post for my reasoning behind the assertion.In this post I am going to be concentrating more on the How than the Why Bother.
To help with this I am going to use, some of the excellent work that Stuart Harrison at Lichfield District Council has done in this area, as examples. Take a look at the spending data part of their site: spending.lichfielddc.gov.uk/. On the surface navigating your way around the site looking at council spend by type, subject, month, and supplier is the kind of experience a user would expect. Great for a website displaying information about a single council. However, it is more than a web site. Inspection of the Download data tab shows that you can get your hands on the source data in csv format. Here is one line, representing a line of expenditure, from that data:
" [statistics.data.gov.uk] District Council","2010-04-06","7747"," [spending.lichfielddc.gov.uk] & SUTOR","401","Revenue Collection","Supplies & Services","Bailiff Fees",""
… which represents the data displayed on this human readable page:

Looking through the csv, you can pick out the strings of characters for information such as the date, supplier name, department name etc. In addition you can pick out a couple of URIs:- [statistics.data.gov.uk] – The UK Government identifier for Lichfield DC
- [spending.lichfielddc.gov.uk] – Lichfield’s identifier for this payment
In the context of csv, that’s all these URIs are, identifiers. However because they are http URIs you can click through to the address to get more information. If you do that with your web browser you get a human readable representation of the data. These sites also provide access to the same data, formatted in RDF, for use by developers.">[http___spending.lichfielddc.gov_.uk_spend_8605670.rdf_.jpg">]
[http___spending.lichfielddc.gov.uk_spend_8605670.rdf"] src="http://blogs.talis.com/nodalities/files/2011/01/Sourceo [http___spending.lichfielddc.gov_.uk_spend_8605670.rdf_thumb.jpg"] /> You can see that data by adding ‘.rdf’ to the end of the address, thus: [spending.lichfielddc.gov.uk] and then selecting the ‘view source’ option of your browser for the page of gobbledegook that you get back.
Inspecting the RDF, you will see that most things, except descriptive labels and financial values, are are now identified as URIs such as [spending.lichfielddc.gov.uk] and [spending.lichfielddc.gov.uk] . Again if you follow those links, you will get a human readable representation of that resource, and the RDF behind it by adding a ‘.rdf’ suffix.
The eagle-eyed, inspecting the RDF-XML for Lichfield payment number 8605670, will have noticed a couple of things. Firstly, a liberal sprinkling of elements with names like payment:expenditureCategory or payment:payment. These come from the Payments Ontology as published on data.gov.uk as the recommended way of encoding spending, and other payment associated data, in RDF.
Secondly, you may have spotted that there is no date, or supplier name or identifier. That is because those pieces of information are attributes associated with a payment – invoice number 7747 in this case.
Zooming out from the data for a moment, and looking at the human readable form, you will see that most things, like spend type, invoice number, supplier name, are clickable links, which take you through to relevant information about those things – address details & payments for a supplier, all payments for a category etc. This intuitive natural navigation style often comes as a positive consequence of thinking about data as a set of linked resources instead of the traditional rows & columns that we are used to. Another great example of this effect can be found on a site such as the BBC Wildlife Finder. That is not to say that you could not have created such a site without even considering Linked Data, of course you could. However, data modelled as a set of linked resources almost self-describes the ideal navigation paths for a user interface to display it to a human.The Linked Data practice of modelling data, such as spending data, as a set of linked resources and identifying those resources with URIs [which if looked up will provide information about that resource] is equally applicable to those outside of an individual authority. By being able to consume that data, whilst understanding the relationships within it and having confidence in the authority and persistence of the identifiers within it, a developer can approach the task of aggregating, comparing, and using that data in their applications more easily.
So, how do I (as a local authority) get my data from its raw flat csv format, in to RDF with suitable URIs and produce a site like Lichfield’s? The simple answer is that you may not have to – others may help you do some, if not all, of it. With help from organisations such as esd-toolkit, OpenlyLocal, SpotlightOnSpend, and with projects such as the xSpend project we are working on with LGID, many of the conversion [from csv], data formatting processes, and aggregation are being addressed – maybe not as quickly or completely as we would like, but they are. As to a human readable web view of your data, you may be able to copy Stuart by taking up the offer of a free Talis Platform Store and then running your own web server with his code that he hopes to share as open source. Alternatively it might be worth waiting for others to aggregate your data and provide a way for your citizens to view your data.
As easy as that then! – Well not quite, there are some issues about URI naming and creation, and how you bring the data together that still do need addressing by those engaged in this. But that is for Part 3….
-
14:52 Linked Spending Data – How and Why Bother Pt1
» Nodalities
National Government instructing the 300+ UK Local Authorities to publish “New items of local government spending over £500 to be published on a council-by-council basis from January 2011” has had the proponents of both open, and closed, data excited over the last few months. For this mini series of posts I am working on the assumption that publishing this data is a good thing, because I want to move on and assert that [when publishing] one format/method to make this data available should be Linked Data.This immediately brings me to the Why Bother? bit. This itself breaks in to two connected questions – Why bother publishing any local authority data as Linked Data? and Why bother using the, unexciting simplistic, spending data as a a place to start?
I believe that spending data is a great place to start, both for publishing local government data and for making such data linked. Someone at national level was quite astute choosing spending as a starting point. To comply with the instruction all an authority has to do is produce a file containing five basic elements for each payment transaction: An Id, a date, a category, a payee, and an amount. At a very basic level it is very easy to measure if an authority has done that or not.
Guidance from data.gov.uk expands on this a little by mandating the following:
Body This should be the URI that represents (or more properly ‘identifies’ – see below) the local authority at statistics.data.gov.uk.
eg. [statistics.data.gov.uk]Date Should ideally be the payment date as recorded in purchase or general ledger Transaction number To identify within authority’s system, for future reference Amount In Sterling recorded in finance system Supplier Details Name and individual authority id for supplier plus where possible Companies House, Charity Registration, or other recognised identifier Expense Area The part of the authority that spent the amount Service Categorization Depending on the accounts system this may be easy or quite difficult. There are two candidates for categorization – CIPFA’s BVACOP classification and the Proclass procurement classification system.
… a little more onerous, possibly around the areas of identifying company numbers and Service Categorization, but not much room for discussion/interpretation.
As to the file formats to publish data, the same advice mandates: The files are to be published in CSV file format - supplemented by – Authorities may wish to publish the data in additional formats as well as the CSV files (e.g. linked data, XML, or PDFs for casual browsers). There is no reason why they should not do this, but this is not a substitute for the CSV files.
So fairy clear, and measurable, then. You either have published your required basic elements of data in a CSV format file, or you have not. Couple this with the political ambitions and drive behind the Government’s Transparency Agenda, and local authorities will have difficulty in not delivering this. Although some are being a bit tardy and others seem reticent to publish in formats other than pdf.
OK so why bother with applying Linked Data techniques to this [boring] spending data? Well, precisely because it is simple data, it is comparatively easy to do, and because everybody is publishing this data the benefits of linking should soon become apparent. Linked Data is all about identifying things and concepts, giving them a globally addressable identifiers (URIs) and then describing the relationships between them.
For those new to Linked Data, the use of URIs as identifiers often causes confusion. A URI, such as http://statistics.data.gov.uk/id/local-authority-district/00CN, is a string of characters that is as much an identifier as the payroll number on your pay-check, or a barcode on a can of beans. It has couple of attributes that make it different from traditional identifiers. Firstly, the first part of it is created from the Internet domain name of the organisation that publish the identifier. This means that it can be globally unique. Theoretically you could have the same payroll number as the the barcode number on my can of beans – adding the domain avoids any possibility of confusion. Secondly, because the domain is prefixed by http:// it gives the publisher the ability to provide information about the thing identified, using well established web technologies. In this particular example, [statistics.data.gov.uk] is the identifier for Birmingham City Council, if you click on it [using it as an internet address] data.gov.uk will supply you information about it – name, location, type of authority etc.
Following this approach, creating URI identifiers for suppliers, categories, and individual payments and defining the relationships between them using the Payments Ontology (more on this when I come on to the How) leads to a Linked Data representation of the data. In technical terms a comparatively easy step using scripts etc.
By publishing Linked Spending Data and loading it in to a Linked Data store, as Lichfield DC have done, it becomes possible to query it, to identifies things like all payments for a supplier; or suppliers for a category, etc.
If you then load data for several authorities in to an aggregate store, as we are doing in partnership with LGID, those queries can identify patterns or comparisons across authorities. Which brings me to ….
Why bother publishing any local authority data as Linked Data? Publishing as Linked Data enables an authority’s data to be meshed with data from other authorities and other sources such as national government. For example, the data held at statistics.data.gov.uk includes which county an authority is located within. By using that data as part of a query, it would for instance be possible to identify the total spend, by category, for all authorities in a county such as the West Midlands. As more authority data sets are published, sharing the same identifiers for authority category etc., they will naturally link together, enabling the natural navigation of the information between council departments, services, costs, suppliers, etc. Once this step has been taken and the dust settles a bit, this foundation of linked data should become an open data platform for innovating development and the publishing of other data that will link in with this basic but important financial data.
There are however some more technical issues, URI naming, aggregation, etc., to be overcome or at least addressed in the short term to get us to that foundation. I will cover these in part 2 of this series.
-
16:32 A Year of Open Government Data: Transparency, but also Innovation
» Nodalities
Towards the end of 2010, Wikileaks generates many headlines as it publishes information on the web, causing controversy and leading to talk about politicians hiding information from the public. Reporters and commentators express shock or admiration when telling the story of a rogue organisation making governmental information public. What has not been as mainstream is that for the past year or more, governments around the world have been doing something very similar themselves: publishing information online.Big names like President Obama, Sir Tim Berners-Lee and the headliners at big events like the International Open Government Data Conference favour publishing public data for transparency and benefits to society. This all finally began to take off in 2010. Governments from around the world have been developing their public information strategies, with the launches of data.gov and data.gov.uk and data.govt.nz.
This is all taking place at a time of economic restraint. Dr Martin Read from the UK Cabinet Office’s Efficiency Reform Board explained in a recent interview: “If you are going to improve the efficiency of something, making that change involves risk and innovation … If they get it wrong, they’re hauled up in front of a committee for interrogation.” (moderngov, November 2010) It may seem tricky to justify the expense of big projects like data.gov.uk, and there certainly seems to be a huge amount of pressure.
Nevertheless, governments are proving themselves committed to prioritising data publishing. Towards the end of last year, the UK Prime Minister announced that every item of governmental spending over £25,000 will be published online, and updated monthly. He emphasised the importance of this publication in terms of transparency, inviting the public to scrutinise the data. Interestingly, he also said: “This scrutiny will act as a powerful straightjacket on spending, saving us a lot of money.” So, not only is data publishing seen as a benefit to democracy, but also as a useful way to “flag up waste”.
While that press conference was taking place, developers and civil servants were gathered together elsewhere at the Open Government Data Camp (disclosure, Talis was a sponsor). At the event, much was made of the modelling and tools which have been developed with open data in mind: particularly the Linked Data API, which allows developers from just about any web background to work with data.gov.uk’s data very quickly. Visualisations demonstrated what can be done with well-structured data.
One of the things this high-level data publishing has done is raise the standard for what can be published and developed. Last year, we built a proof-of-concept app for the Department of Business Innovation and Skills (BIS) to illustrate the potential of applications of this data. A few minutes spent on DEFRA’s UK Climate Projections site shows what can happen when raw data is matched with a plan, and is designed with a citizen in mind. Anyone can check the primary source for their government’s climate policy, and it doesn’t take a climatologist to understand it. A little further development allows fully-fledged applications to be built that are instantly useful: one available on the front page of data.gov.uk lets me download an app that helps me plan my cycle route!
Open government data is probably good for transparency. But it’s also got a plenty of potential to seed ideas that add value to this information. Innovators know that there are more people with better ideas outside our organisations than could possibly be in them, so sharing means that they can be developed into products and services that are mutually beneficial to everyone. The web industry routinely works with open-source software that’s been at least partly built by others, and this open-source mentality might just be an incredibly useful piece in the public-sector machinery. Open business models work very well with ideas.
2011 promises to be the year when all this data gets put to use. I was recently invited to a press conference at which the Deputy Prime Minister confirmed the UK’s commitment to published data as a priority and even a recognised civil liberty. The story will shift to more local applications of big public data tools. January will see the publication of local authority’s spending data, and public bodies will be looking to add value to this data, bringing the headlines of open data to life in the places we live.
With a bit of thought into how data is published in the first place, and a plan for encouraging people with good ideas to work with this information, this investment in data publishing could be more than just a tick-box exercise for a political transparency agenda. I hope that this year, it won’t be Wikileaks-level events that get people talking about open data publishing. We should notice it improving services we use, and see whole new applications for the bits and pieces of information that make up our public lives.
-
14:37 European Summer School
» NodalitiesTalis is delighted to be one of the sponsors of the 8th European summer School on Ontological Engineering and the Semantic Web (SSSW 2011). There will be more about this in coming posts, but just to start off:
We are sponsoring it for a very simple reason. The mix of theoretical, practical and collaboration skills used by all the students involved from across Europe directly corresponds to how we work at Talis. It’s an environment of support and challenge, contribution and connection that has proved beneficial for all involved over the years. Talis is proud to contribute and participate to further the aims of the community.
Talis is a small and ambitious company of likeminded, motivated people. A phrase we often use here is Human Scale. Culturally what we mean by that is we like working closely with people who we all know, whether as employees of Talis or (more likely) over time collaborating as partners in joint endeavours.
We want to grow our company and contribute to the communities we belong to. We know that it is by fostering relationships with others driven by the same passion to collaborate and learn that we can build on the ambitions we have for ourselves and for the communities we belong to. One particular aspect of the Summer School is this same notion of social connectedness, a personal network of trusted relationships that challenge and enhance the experience for everyone.
-
12:33 Information as a Civil Liberty
» Nodalities“Free citizens must be able to hold big institutions and powerful individuals to account.”
I attended a speech at the Institute for Government by UK Deputy Prime Minister Nick Clegg at which he outlined the government’s stance on civil liberties. This topic is one I am particularly passionate about as a citizen of two democracies, and as a lover of history and human communication, but what was there to interest a software evangelist?
Mr Clegg’s speech is available as a transcript from his party’s site, so you can have a look at the same words I heard. If you read through a lot of the political positioning (references to “Labour”, for non-UK readers, refer to the majority party of the previous government), you get to the bit that interests me as a Talisian as well as a human.
The final point talks about citizens having the right to public information, and the right to speak out about what government (and, notably, publicly-subsidised industry) is doing. The freedom of information and freedom of speech are under the same heading. As Clegg put it:
“It is a modern right to information combined with traditional freedom of expression.”
Examples are given of current transparency measures, including the publishing of particular datasets that are already being used in innovative ways and to hold the government accountable. It’s clear from the speech that transparency is a priority, and that publishing data is seen as fundamental to this.
The theme of balancing security and freedom is repeated throughout the talk, alluding to the fact that some information in any government is clearly going to be kept secret. But the emphasis is on publishing wherever possible, and it was interesting that this felt like the most specific theme of an otherwise very high-level speech. This is an area of public policy that has been changing through the launch of data.gov.uk and the continued efforts of two successive governments (and, interestingly, all three major UK parties) to put public data online. The idea that these datasets will be used, reused, mashed up and seed innovation is at the forefront of these talks. This isn’t just data that can be seen, it’s data that can be used.
So, this government seems committed to continuing the trend for transparency through public information, and for their data to be made available online and in useful ways. The emphasis in this speech, however, adds a new dimension to the commitment, at least the way I understand it. It’s not just that data is a right of any free citizen—the Prime Minister said as much before he was PM—but that this right goes hand-in-hand with the citizen’s right to free speech.
Government publishing its data online, free to reuse and feed applications that make it easier to interact with the information has been a huge step. Alongside this is the area of libel reform, which is a topic too big to get into here but involves the scrutiny of scientific and journalistic investigation without the fear of prosecution. (Guardian journalist Simon Singh discusses libel reform here.)
Although Mr Clegg’s talk is mostly general, discussing big ideas and leaving out specifics, I think the principles discussed were hugely important, and it is good to see a further commitment to public data. As a Talisian, it’s great because we work a lot with this kind of data, and it means we get to do more interesting things with it. As a citizen, it’s important that we can see more of what’s going on within government and that it is being considered fundamental enough to mention alongside freedom of speech and libel reform encourages me.
What I’d like to see this year is the specifics, now. What specific things will make publishing public data easier and more thorough?
-
17:33 A New Revolution
» Nodalities
A colleague sharing their experience of visiting Ironbridge, promoted as "The Birthplace of the Industrial Revolution" helped clarify some thoughts I have been brewing to help convey where the current Linked Data enthusiasms and initiatives may lead us.
The famous Iron Bridge, opened in 1781, spans the River Severn in Shropshire, England. To quote the Wikipedia "It was the first arch bridge in the world to be made out of cast iron, a material which was previously far too expensive to use for large structures. However, a new blast furnace nearby lowered the cost and so encouraged local engineers and architects to solve a long-standing problem of a crossing over the river." The raw materials of iron ore and coal had been known for a long time, but it took the building of a nearby furnace, using the innovation of coke as a fuel, that enabled the local community to invest in the construction. The outcome was not only to stimulate the local commercial and administrative economy, but it also became an 18th century tourist attraction, which it continues to be today.
All very interesting, but what has this to do with Linked Data and it’s future? The impact of Linked Data and the Web of Data it enables, on the way we interact and do business, will be greater than that of the World Wide Web that it builds upon.
When one makes statements like that one, you are often asked to justify yourself. As you may know, I like to use analogies to help clarify things and I believe the Industrial Revolution is a good one in the case of the future for Linked Data and associated techniques. I am also very aware that analogies tend to fall apart if you pick at the detail too much, so please bear with me on this one.
Like the Industrial Revolution, Linked Data is building on what went before. Before the Iron Bridge, there were other bridges, roads, and uses of iron. Before Linked data there was/is the Web – a globally distributed web of linked human-readable web pages, upon which are surfaced words and images for our information, entertainment and commercial desires. Data of course plays it’s part, often powering the websites that we all consume.
So what is so special about a Web of Data? – The data comes out from behind those websites to be linked with other data across the web, or maybe an intranet. Using the same techniques for linking pages together [the URL], data identifiers are given URIs. This means that a piece of data is given an identifier that is addressable across the web and therefore linkable with other data identified in a similar way.
So where does the Industrial Revolution analogy kick in? Well, once data are identifiable in a globally distributed context, they can be linked, mixed, mashed, and generally used to add value to each other. Your data can become the raw material for someone else’s process – your Wikipedia comment about an animal can become the description on a, data powered, BBC page about that species. As with coal, which after some refinement can become coke to be used to add value to the iron smelting process, any published data can be the raw material for value adding/combining processes. The processor, utilising their knowledge, skills, and experience to produce an alloy of data, the combination of which is greater than the sum of it’s parts.
In the same way that some freely available elements, such as the air pumped in to that blast furnace, were needed to get the process going; freely and openly available data, such as governments and the media are publishing, are priming the pumps of a data revolution. Whenever there is value to be added in a process there is both community and commercial opportunity. Once people start using their skill and understanding of a facet of knowledge, to link data from one free, or commercial, source with more free or commercial data they can produce either a saleable result, and/or an enhancement to their own services. The output of one value-add process can then become one of the sources for yet another, and so on.
To finally stretch my analogy just a little further – looking back to those early days in the Severn Valley, it is possible to identify the building-blocks that led to commercial steel production, the age of steam, the automobile industry, and space flight. Most of which would have been unthinkable by those early pioneers. Pre-1994, could we have predicted the growth of Google, YouTube, Wikipeadia, and Twitter? In 2010 can we identify the building-blocks of a data revolution? – I think maybe we can.
So how will such a revolution, underpinned by Linked Data, change the way we interact and do business, more fundamentally than the Web has? - By creating whole new communities and industries to connect, supply, trade, enhance, distribute, interpret, and build services and applications upon a supporting web of globally available data elements and alloys.
Technorati Tags: linkeddata,Linked Data,Open Data,Web of Data,Web,Industrial Revolution -
13:15 What place for libraries in a Linked Data world?
» NodalitiesI presented What place for libraries in a Linked Data world? at Online Information in London on Wednesday 1st December. The slides are available on Slideshare, but I’ve now recorded some accompanying audio to enable those who weren’t at that event to make better sense of the presentation. In considering the question what place for libraries in a Linked Data world? I have embedded some preliminary research on the applicability of a Linked Data approach to intertextual relationships between literary works. I hope to research this area further, but related to the question at hand, it fits in nicely as a case study.
<h2>What place for libraries in a Linked Data world?</h2>
-
10:49 Challenges and Opportunities for Linked Data
» NodalitiesYesterday I gave a short talk at Online Information 2010 titled “Challenges and Opportunities for Linked Data” (abstract). The presentation highlighted what I saw as the main challenges that face us as we grow the web of data, and highlighted some opportunities for organisations that want to get involved.
I believe there will be video from the various presentations online at some point, but wanted to post a transcript of what I said (or had planned to say!). The slides are up on slideshare if you’re interested, although they’re largely just transitions to highlight my main themes.
Introduction2010 has certainly been the year of Linked Data. I’ve been working with RDF and Semantic web technologies for about 10 years now, and its clear that the last 12 months have been one of the critical growth points for Linked Data and the semantic web as a whole. There has been more debate, engagement, and publication of data over than ever before.
This is in no small part due to the fantastic work that has taken place at data.gov.uk. The project has not only championed the approach but also lead the way as an exemplar for how to do this stuff really well. The adoption of RDFa by Facebook, Google and others has also created a much needed feedback loop that is driving the publication of more structured data.
But as the technology grows we’re starting to experience growing pains which are presenting challenges for further growth and adoption. I think we’re also getting a sense of the opportunities that may arise from the web of data. I picked out three key challenges to review in the presentation.
CraftThe first of these relates to what I’d call “the craft” of Linked Data. To date the growth of the Linked Data cloud has largely been driven by skilled artisans — from academia and a small number of commercial organisations — who know how to work with the technology, how to use and manipulate the data that is already available, and how to get things online and linked together in a way that achieves the 5 star approach.
To scale beyond the initial Linked Data community we need to move from an artisan lead approach and enable “journeyman” developers to achieve the same things. There are several facets to this skills transfer.
Tooling is clearly one important area. It’s a truism that Linked Data tools aren’t as polished as they might be. After all it’s still a relatively new technology area. The majority of Linked Data artisans have been happy enough either to make their own tools or to work with a disparate selection of tools to get the job done. But there is still a lot more work to do in creating a more integrated toolkit that journeyman developers can reach into to help them quickly and easily publish data.
To be fair though, I think we’ve needed these past few years of publishing and experimentation to really highlight what those basic tools might be.
The other aspect of craft is education and training. There’s still a relatively small community with deep skills in this area, so thought has to be given to the ability to transition wider. Having helped train and advise a number of team and organisations over the past few years, most recently as part of our consulting work at Talis, its clear that there’s a journey or apprenticeship that many teams and organisations undertake as they begin to experiment and gain experience with the technology.
Within the Linked Data community we need to prioritise the work on these tools and services to make it easier for others. We also need to devote additional work to help nuture or define more standard vocabularies for publishing specific types of data. In my opinion this is the real challenging work: it’s not as fun or exciting as publishing the next new dataset or exemplar, but it’s absolutely necessary to push things to the next level. It’s going to take real commitment from all of us.
In my mind there is no better way to help pass on the skills of the initial artisan community than by encoding that knowledge in the form of tools, vocabularies, best practices and design patterns.
Fuelling ApplicationsLinked Data isn’t being used as much as it could or should be. Why is this?
I think there are two reasons. The first relates to my previous point about enabling the “journeyman” developer. Right now it takes a certain amount of skill to get the most from Linked Data and SPARQL. This presents a road-block for developers who may be interested in using some of the available data. It may even stop them looking at all.
To solve this we must be ready to meet people half-way. Publish simple JSON formats alongside the RDF. Use the Linked Data API created for data.gov.uk to provide simple RESTful APIs into your RDF data. Choice opens up more integration opportunities as well as encouraging engagement. The power of SPARQL and other tools is fantastic, but that power is not needed by every developer in every application. Be inclusive when opening up data.
A potentially larger issue is that much of the data available as Linked Data is either static, irregularly updated, or already available in other more accessible formats and APIs. This isn’t true across the cloud as a whole, but timeliness is an issue in many areas. It’s a consequence of the early boot-strapping process which emphasised conversions of available data dumps, and the wrapping of existing APIs and services. As a boot-strapping process that has been fantastic. But it’s not driving engagement: why use data if you can get it somewhere else easier, and in a more up to date form, using tools that you’re already familiar with?
I also think that this is contributes to the reason why it has been difficult to show the power of Linked Data: many of the demonstration apps could easily have been built with other APIs. I think this could be on the cusp of changing as there is now a critical mass of information available to do some powerful queries, and an increasing amount of data is now becoming primarily available as Linked Data.
The challenge we face is changing the nature of the Linked Data cloud from what is a largely static and slow moving environment to one that is much more lively and real-time.
SustainabilityThe third challenge I highlighed was sustainability. It’s easy to look at the Linked Data diagram and think: “Well, those bits are done, all we need to do is look how to grow the diagram. We just need to add more data”. I think that’s a natural but unfortunately misleading viewpoint: we need to look carefully at our foundations.
Not all of these sources are on infrastructure that could support real, high volume usage. And few of the datasets are clearly licensed. I’ve personally encountered a number of occasions where some significant datasets are offline or unavailable. So we need to be realistic about whether people can build a stable, commercial application against the web of data as it exists today.
Again to solve this, we need an increasing number of primary sources, making high quality data available on a regular and timely basis, backed by the ability or commitment to deliver those services at the scale we will all eventually require.
In reality this challenge isn’t unique to Linked Data. It’s largely true of the web as a whole; after all not every web site or application is intended to scale to high volume usage. But we’re now talking about a potentially much deeper integration between different applications. We can see the same issues occuring around APIs and data access in general. In recent months there have been a number of stories of developers scrabbling to adapt as APIs get changed, taken down, restricted or re-licensed leaving them high and dry.
To me the beauty of Linked Data, and RDF specifically, in this regard is that it is so much more portable than any other format. This means that we can easily replicate data to share the load of providing access. With Linked Data we have the option of federating or sharing data across the web. (One of the reasons we started the Talis Connected Commons scheme was to help create sustainability around Public Domain datasets.)
The portability of RDF also makes it easier for a range of organisations to offer scaleable value-added services over the same datasets. For the first time we can decouple the curation of data from the delivery of services over that data.
So those are my three challenges. I think these are largely point in time issues, but we’re going to have to work at them to move forward.
What about the opportunities?
Become a HubOne of the interesting properties of the Linked Data cloud diagram is how it clearly illustrates the emergence of a number of hubs — like dbpedia — that form the focal points for links from a number of different datasets. If you look closely you can also see that there are emerging hubs within specific subject domains.
I wonder whether the hubs that we see today will continue to play such a key role as the web of data evolves? My feeling is that in a few years time the picture and connectivity is going to be quite different. Particularly if we continue to see engagement from government and other sectors.
There is clearly an opportunity here for organisations who are already key enablers within a particular sector to become a linking hub on the web of data.
If you poke around in any industry, its not hard to find organisations who act as the “switchboard” for that particular sector. Either because they manage some key identifiers for the sector as a whole, or because their identifiers and systems have become de facto standards for achieving interoperability. It would be a natural step for those organisations to carry that role forward to the web of data, retaining that key position.
Clearly not everyone can be a significant hub like Dbpedia. But every organisation can act as a hub for its community of customers, partners and users.
The reasons and benefits for doing this are well documented: opening up data can drive new business, innovation, and traffic. Success on the web involves giving your organisation the greatest possible surface area and points of attachment. Linked Data is an excellent way to achieve this as to emphasises the right forms of web integration.
Turn Identifiers into ChannelsLinked Data requires you to assign URLs to identify things: people, places, events, whatever. Generally we tend to focus on how that is an important step to publishing data: concentrating on the mechanics of what makes a good, stable identifier and highlighting how this becomes a key way for other people to find your data.
What this misses is that those identifiers can also become channels, or hooks, for your organisation to find other people’s data. Once you have published Linked Data and it becomes linked to by other datasets all of that external data annotates and enriches your own, providing valuable and useful context. Linking data creates network effects, and everyone in the network benefits. That includes you.
The external data is easily accessible through link discovery so it becomes much easier to find, aggregate and analyse it for a variety of purposes. That might be to drive new product features, or to simply power business intelligence and analysis within the enterprise.
I tend to think of it as being able to fish the web of data for useful context. Your URIs are the hooks. Your data is the bait.
I stopped to draw a parallel here with some comments made by Dion Hinchliffe in his opening keynote. Hinchcliffe pointed to the rise of a number of startups and tools supporting analysis of data collected from the open web, perhaps mixed with data from internal enterprise systems. The end results of that analysis is new data and insights that will need to be integrated into an organisations core systems, especially if the intent is to drive more than just management reports.
My prediction was that over the next 12-24 months we’ll begin seeing this type of third-party organisations not just offering SaaS access to analysis systems, but direct insights that are already integrated into a customer’s data via the public identifiers its sharing as Linked Data. This has huge potential value and can completely change the costs and approach to data integration.
The time scales may be completely off. But there’s a real opportunity there in my opinion, particularly for organisations that do market and social media analysis.
Data as a ServiceIt’s been said before but its worth repeating: Linked Data isn’t necessarily Open Data. The technology is not at odds with exploring business models around data services or access.
The “Data as a Service” (DaaS) idea is gaining momentum in a number of different areas with an increasing number of commercial APIs coming online. We should also soon be seeing commercially available services directly powered by open data sources or through mining those sources.
There are a number of different business models that can be wrapped around data access, ranging from charging for the data itself, through cost recovery for service provision — something that may be relevant for long term usage of government sources — or just charging for delivering reliable, high performance services over open data. There are good reasons why developers may want to pay for reliable services.
Clearly open, sponsored access to data and services will remain an important part of the ecosystem. In fact some level of open data is required to drive the network effects we are seeing around Linked Data: the identifiers and some key metadata needs to be open and remain open; but additional “depth” could be available at a premium.
Summing upI had no big conclusions to draw from my talk as my goal was to highlight the challenges and opportunities ahead. Clearly I could have chosen a different mix but drawing on my recent experiences engaging with a wide range of different organisations these are the issues and opportunities I’ve most commonly encountered and discussed.
Do you have a different perspective? Perhaps some ideas about how to face these challenges, or a different view of the immediate opportunities? If so, I’d love to hear from you.
-
6:41 Thanksgiving for Open Government
» NodalitiesOn the eve of the American Thanksgiving holiday, millions of people travel to spend time with friends and family. Before I share a meal with relatives, I contemplate the connection between the first thanksgiving and the emerging Open Government movement.
The “First Thanksgiving” celebration in the US was a feast shared by 53 starving pilgrims who survived a brutal winter in New England, and 90 Native Americans. The Native Americans knew how to manage their land and waters to provide sufficient fish, meat, vegetables and fruit.
The connection between the first American Thanksgiving and Open Government has to do with adapting to a new world by sharing information. Four hundred years ago, the Native Americans shared information on seeds, crops and planting conditions, helping the pilgrims survive. Today, sharing information via the Web is helping us to better understand climate conditions, our health care options and issues impacting our local community.
Last week I joined about 250 people at the first International Open Government Conference, hosted by the US Department of Commerce in Washington DC. Approximately half the conference delegates were from government, the balance from academia and the private sector. The speakers discussed Open Government projects underway in the US, UK, Australia, New Zealand and Brazil. Speakers shared success stories and areas for future development. The common theme: democratizing public sector data and driving innovation. Jonas Rabinovitch from the United Nations Department of Economic and Social Affairs highlighted several eGov strategies in developing nations. Mr. Rabinovitch noted that all but three UN member nations have a basic Web presence, many offer online forms and some provide the ability to perform transactions via the Web.
Given the conference was hosted in the US Department of Commerce, data.gov featured prominently. “The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.” Seven countries have stood up Open Government sites in the last 18 months, including UK, US, Australia, New Zealand, Canada and Finland. Government administrators are seeking to restore public trust and establish an environment of transparency, participation and collaboration with the public.
The US Administration launched its Open Government Initiative in April 2009. In the last two years, I’ve watched the US Executive Branch begin to move from a “need to know” to a “need to share” culture. This cultural transition and thus this Open Government Conference, was truly historic. The conference underscored to me that we all, regardless of our political views and affiliation, live in a highly interconnected global economy, underpinned by the World Wide Web.
Respected advisors on Open Government initiatives including Professor Jim Hendler of Rensselaer Polytechnic Institute and Sir Tim Berners-Lee, Director of the World Wide Web Consortium, agreed that public participation and collaboration will be key to the success of Open Government initiatives. I believe that more conferences like this one and the Open Government Data Camp 2010 held in London last week, drawing delegates from a variety of disciplines, from several countries, will do a great deal to reinvigorate civic engagement and economic growth from the ground up.
Government employees are responding to mandates to publish content to Open Government websites. Data.gov was launched in April 2009 with 47 data sets. Vivek Kundra, U.S Chief Information Officer stated that data.gov has in excess of 300,000 data sets as of November 2010. A large portion of the data.gov data sets are geospatial information which is an opportunity for scientists and entrepreneurs to build tools for analysis and visualization of this valuable data. The UK Government as published over 4,600 data sets, including many from Great Britain’s national mapping agency, Ordnance Survey, providing the most accurate and up-to-date geographic data for the UK.
“The stakes are high for our interlinked global economy.” Dr. Robert Schaefer, Deputy Project Scientist from Johns Hopkins University Applied Physics Lab gave a compelling presentation on the need for mechanisms to make sense of published data as Linked Open Data. Publishing the content as in RDF is not sufficient, rather, providing context on what the data implies is necessary. Better tools for analysts and scientists to extract meaning from Linked Open Data will allow critical information on climate change and space weather, for example, to be more readily understood by policy makers. Professor Schaefer stated the implications for climate change are serious, wide ranging & urgent. Current CO2 emissions are higher than the International Panel of Climate Change “worst case” scenario. Billions of people may experience serious consequences from climate change. Professor Schaefer reiterated the need to get started as soon as possible. “When the water from the sea rises, millions of people will have to move.” This international conference will hopefully stimulate cooperation between the public and private sectors. It is a critical step in making data accessible and providing decision support tools for space weather and climate change.
Mr. Kundra acknowledged we have much more to do to improve the quality of published data sets. He said, “when I’m able to perform analytics on the fly, grounded on quality data, we will have achieved success.” Delegates were encouraged by Mr. Kundra and other speakers to build out communities of interest, lead by individuals, rather than government agencies. The US Government is regularly launching challenges, see [www.challenge.gov] , with modest cash prizes targeting citizens to gain insights on how we, the people, not government, can solve problems ranging from education on childhood obesity to sustainable urban housing that respects the environment.
Beth Simone Noveck, United States Deputy Chief Technology Officer for Open Government, leads President Obama’s Open Government Initiative. Based at the White House Office of Science and Technology Policy, she is an expert on technology and institutional innovation. Ms. Noveck stated that “the Open Government Initiative is not transparency for transparency’s sake. It is through participation and collaboration with academia and the public sector that there is value.” Creating partnerships to use Open Government Data for important and unforeseen uses is empowering individuals with the ability to make better decisions and affect our quality of life.
We are in the very early stages of making Open Government available as Linked Data. Today, we are in the very early phases, however, there are many good reasons to support Open Government initiatives including accountability in spending, improved health care provision, and addressing climate change and space weather which affects the world’s population. The international data exchange standards are in now in place. While experts will continue to refine the technical underpinnings and best practices will evolve, the citizen lead movement, assisted by government, is truly underway.
Bright young geeks are increasingly involved in American civic life through non-profit organizations like Code for America. Passionate entrepreneurs like Dan Melton show that being being super bright and engaged at a grassroots level in government is both hip and necessary. Code for America recruited twenty “fellows” from 362 applicants to get involved in city projects in 2011. One example discussed was the Boston Project whose idea is to bring info on students together & create interesting applications leveraging federal census content, student data, transit info, city and state data.
Each month new mobile applications and social networking solutions are made available. These are not expensive, government top down initiatives, rather, they are coming from the ground up by military personnel, students, local government officials, publishers, scientists and citizens who value transparent government. An interesting mobile app for Android, iPhone and the iPad was unveiled for the New York Senate. It is a real-time constituent mobile dashboard to the legislative process allowing citizens to connect with Senators, find and comment on bills, review votes and transcripts.
Academics are doing innovative research. Grad students and post-docs are rapidly prototyping what the new world of open data will look like. An increasingly number of software companies, including my employer Talis, are producing light weight platforms and cloud computing solutions. Thousands of smart people have been creating the foundation of the Linked Data “ecosystem” in the form of International Data Standards and best practices over the last fifteen years, largely through the important work of the World Wide Web Consortium (W3C).
The availability of improved development tools is seen as a requirement for widespread proliferation of Semantically enabled applications, however, people are leveraging international standards such as RDF for Linked Data, content sharing models, well-documented licensing models, and existing best practices. Fully 25% of the applications shipped on a new Apple iPhone use government produced content.
I believe there are significant opportunities for commercial software firms to produce services and products to visualize data sets, find related data sets and most importantly, provide mechanisms as easy to use as the early Web to publish machine and human readable data as Linked Data. There is burgeoning information economy rapidly forming around provision of public and private data mixed together in novel ways. I believe that in 2011, truly useful tools for Web developers to create compelling Linked Data applications will be available for use with Open Government data.
We should all acknowledge that data will never be 100% perfect. Real data is dirty, face it. Yes, concerns will linger about misinterpretation and inappropriate mashups until people gain experience in making informed decisions based the data presented. Be patient and don’t expect it to be perfect on day one or even year one. Allow best practices to emerge from the ground up, by communities of interest. Issues of data quality, provenance, context and important elements such as units of measure will all be addressed as Linked Data becomes more mainstream. Harvard Business School published a blue print for use of open government data. The W3C provides lots of useful guidance on eGovernment and Linked Data activities.
Just as the early American pilgrims experienced miscalculations in weather and agriculture, they eventually they figured out how to plant seeds correctly and increase their potential for a bountiful harvest. Through information sharing and discussion by informed citizens, the US evolved a free and democratic form of government that is admired by millions of people around the world.
I’m optimistic that the citizens of the world will leverage Open Government initiatives for positive outcomes. The more our governments support openness and transparency through Open Government initiatives, the more we, the people, can solve issues that matter at the community-level or on a global level. The stakes are high and we should be grateful and cooperate to harness the power of Open Government data and the Web. We are defining our history, as well as our future, today.
-
15:29 Working on Plings
» Nodalities
It’s always good to work on projects that aim to make a difference and to contribute something: you could say we look for projects with some substance to them. So, it’s been fun to work with social research company, Substance on their Plings project. If you’ll forgive the opening pun, I’ll explain a bit.
The Plings project aims to gather together the best available information about “positive activities” for young people: PLaces to go + thINGS to do = PLINGS. Substance describes Plings as: a search and discovery tool that helps people to find accurate and trusted sources of information about positive activities for teenagers. So, I can look for Plings around Talis’ Birmingham offices, and find out about football coaching, cafes, dance and musical projects: all happening within a set radius of my postcode. It’s a versatile tool, letting the searcher facet their results and customise the display, and it also ties in with social networks (check out the fantastically-named “boredometer” for example.), and devices.Feeding the Plings site is a dataset comprising two main parts:
- Data on the actual activities: places to go, things to do
- Data on feedback relating to the activities: “Plingback”
Substance uses various methods to collect the first dataset, routing it through their own API. This lets them use data from many different formats and shapes: from local authorities, third sector and community groups and the private sector. For Plingbacks, though, Talis has been working with Substance to create an infrastructure that can be used to generate data in RDF which Talis hosts through it’s Managed Service. There is a bit more about the Plingbacks app on appspot for more detail, too.
In short, the Linked Data approach enables Substance to have multiple Plingback widgets that can be presented through multiple channels. Because they all share the same API and data structure, they can use the Talis Platform to query and visualise the data dynamically.
Substance’s Steven Flower also told me a bit about a related project building on the back of Plingbacks and the Talis Platform called Plingalytics: a sort of dashboard enabling local authorities and stakeholders to get a very useful view of the Plings datasets. It will let them answer questions like: “How many Plings do we have on a Friday night?” or simply: “What’s hot? What’s not?”This ties in with another side of Plings, which works with local authorities to “fulfil their statutory duty to publicise and keep up to date comprehensive and accurate information on positive activities for young people and to make it accessible,” according to Substance’s site.
It’s an exciting project to be working on, and I’m very interested in the way it ties in local government, young people, and activities through a very positive use of the Web. The fact that they’re using Linked Data to back the interrelated data makes a lot of sense, and we’ve been working together for a long time pulling together Linked Data opportunities and matching them with solutions. Alongside looking after the Plingback dataset on the Platform, our consulting team has worked with Substance to model and convert their data to RDF. In addition, and because of the open nature of the data Substance is working producing, Plings is able to make use of Talis’ Connected Commons scheme for some of its data: meaning that not only can this information be managed free of charge, but it’s available on an open data licence.
Steven Flower said: “We are very excited about this. From a technical point of view, the opportunity to build this upon Linked Data sets is also interesting. Hence, we have chosen to work with Talis for the infrastructure, knowledge, support and enthusiasm that they bring.
We have had the support of Talis since early days of Plings, so it’s good to continue.”
More information on the Plings project from Substance can be found on their Plings info page.
-
19:00 Introducing David Wood
» Nodalities
Talis’ new subsidiary, Talis Inc., was announced just one busy month ago. As Talis’ new chief geek in North America it is high time for me to introduce myself to the Nodalities community.I am a software engineer with a long history working on the Web, developing Semantic Web and Linked Data infrastructure and building Web standards. At Talis, I will be helping to define our market entry into the North American market and contributing to the technical direction of the Talis offerings.
Talis is a wonderful company to work for. The environment is extremely collegial and pleasant, although also very productive. I look forward to contributing to both the company and its communities, and to bringing some of the Talis culture into North America.
Talis Inc’s CEO, Bernadette Hyland, and I have started several companies together including Semantic Web startup Tucana Inc (sold to Northrop Grumman Corporation in 2005) and, more recently, the SemWeb consultancy Zepheira. Much of our previous work has been released as Open Source Software, including the Mulgara Semantic Store, a popular Semantic Web database, the Persistent URL software and, most recently, a project aimed at making Semantic Web and Linked Data applications easier to create, called Callimachus.
The success of the Web is based on standards of the World Wide Web Consortium (W3C), so I have tried to help them whenever possible. I co-chaired the Semantic Web Best Practices and Deployment Working Group and, more recently, the RDF Next Steps workshop. In 2011, I’ll be co-chairing a new working group aimed at updating the Resource Description Framework (RDF), the technical standard underlying the Semantic Web and Linked Data.
The growth of Linked Data has lead to some truly interesting applications. I’ve been working with many others to collect some of those use cases into book form. The goal is to help others replicate those early successes. The first book, Linking Enterprise Data has just been published and is available freely on the Web. It may also be purchased in ebook and printed form. A second book is to be entitled Linking Government Data. We are currently seeking contributions, so please contact me if you have a good story to tell about the use of Linked Data in government settings.
I occasionally teach computer science and mathematics courses at the University of Mary Washington in Fredericksburg, Virginia. Most recently I’ve taught Computer Networking and Introduction to Discrete Mathematics. It looks like I’ll be teaching an upper division elective on Linked Data during the summer of 2011. I thoroughly enjoy working with university students. Theirs is a fascinating time of life, when they choose how they see themselves as individuals and what they will (at least initially) do for a living. They also help me keep a fresh perspective on our rapidly changing world. Seeing the Web through their eyes is really very different than seeing it through mine.
We, as a community, built the Web. We continue to build our community as we build the Web. I look forward to being on the journey with you.
-
14:42 “Linked Data” at the Guardian
» Nodalities
Nodalities Magazine article by Martin Belam.During October at Guardian News & Media we announced a change in our Open Platform Content API. For the first time, developers and users could query our database of over 1 million content items by using the common external identifiers of a MusicBrainz ID or an ISBN number. It is our first step into the world of ‘Linked Data’.
The Open Platform Content API was launched as a beta in 2009, and earlier this year was launched as a commercial product, allowing partners to re-use Guardian & Observer content in a variety of different ways. There is, for example, a Wordpress plugin that easily allows you to include Guardian content in your blog, and developers have built applications like a bespoke recipe search on top of the data. It is a unique proposition amongst news organisations on the web, and as well as the Content API itself, the Open Platform also includes publishing the source data behind Guardian journalism on the Data Store, and providing a search engine for Government datasets from around the world.
Why linked data at Guardian News & Media?The addition of linked data to the API is the culmination of a great deal of work behind the scenes to get the data prepared, and to work out the right way to make it available. Personally, I had been struck the first time I saw the linked open data cloud diagram that none of the bubbles represented any of the UK’s traditional print news organisations. With our combined centuries of experience sifting, collating, organising and publishing information, it seemed to me that they should in fact be occupying a central position on that map. The principles of linked open data also chime with the over-riding principles we have about our web presence at Guardian.co.uk. We strive to be ‘of the web’, not just on the web. That means reaching out and embracing external services and data, and our intention is to have permanent, predictable URLs for all of our content.
The first challenge to implementing this was to pick stable and reliable external datasets that would form a permanent and meaningful relationship with our content. We decided that a focus on distinct cultural entities would work, and avoided the messiness of trying to decide whether a story was ‘about’ something, or whether it just ‘mentioned’ something. MusicBrainz IDs and ISBN numbers seemed like datatypes we could work with.
The domain model of our content already had a concept of an ‘external reference’ that can be added to a tag or a factbox or an article. We have previously used that to link articles to a page about a specific film, or to link a sports match report to game statistics provided by a third party like Opta. The obvious route was therefore to expose these ‘external references’ in our API
MusicBrainz IDs
ISBN numbers
With MusicBrainz IDs, we did not attempt to tag all of our music story archive. There are around 42,000 music content items currently on our site, and to accurately add MusicBrainz IDs to them would be an arduous task. Fortunately, because of our domain model, we had a shortcut to tagging this content. All of the items in our database are given tags. These indicate the type of content (e.g. article, audio, video), the tone of content (e.g. news, comment, review, obituary), the contributor who produced the content, and keywords representing the subject the content is about. In the Music section, we have around 600 of the artists we write about most frequently who exist as keyword tags. The quickest route to adding MusicBrainz data was to add it to these artist keyword tags. The actual job of tagging was achieved via the rather dull mechanism of filling in a Google Docs spreadsheet, although developer Daithi Ó Crualaoich built a tool to help us. He came up with a quick browser-based hack that simultaneously put the same search string across our music tags and across MusicBrainz, and matched the outcome. A script then uploaded this to our database. ISBN numbers were another obvious choice for us. The majority of our book reviews on the web feature a ‘fact box’, giving details of the publication and a corresponding link through to our book store to make a purchase. This ‘fact box’ frequently includes the ISBN number of the publication, and so exposing them as a search criteria was not a massive undertaking. Nevertheless, as with our music content, we do not have universal coverage. At the time of launch around 2,500 reviews out of a possible total of 17,000 had ISBNs attached to them. This is part of the production process now, and so all reviews going forward should have the ISBN added.
API query types
‘Linked data’ formats
The Open Platform supports a range of ways to query this data, and you can find a guide at: [gu.com] Obviously you can query the API looking for a specific reference, so a query for reference=musicbrainz/05ec70a5-3858-4346-a649-fda0a297b8c1will return content about Shirley Bassey. Additionally, you can get a list of content which has a MusicBrainz or ISBN attached to it, soreference-type=musicbrainz|isbnwill give you content from the API which has a MusicBrainz OR an ISBN added to it. Adding the ‘show-references’ parameter will return a block in your API responses that includes MusicBrainz IDs or ISBN numbers for any item within the list. If you’ve not used the Guardian’s API before, you can get a feel for how it works by using our browser based API explorer.It does seem that as soon as you put the words ‘linked’, ‘open’ and ‘data’ into the same sentence, you automatically invoke a debate about what formats are appropriate to use. At the present time we are making these persistent external IDs available alongside our content items in both XML and JSON formats. And yes, that does mean that we have steered away from RDFa and SPARQL.
From our point of view there is a clear reasoning behind this. We try to work in a lightweight and agile way, and providing the data in this format was the simplest way to meet our immediate requirements. We are trying to concentrate on making more metadata available. If we were to decide to invest in triple-stores and implement a SPARQL endpoint first, then I’d wager that we would still be waiting to dip our toe into the water.
Moreover, it would be wrong to commit our editorial production colleagues to tagging up all our content with this extra layer of semantic data, if we can’t show the benefits. It is my hope that by incrementally releasing extra layers of linked data through our API, in a simple way, we can see what works and what doesn’t, and what types of data interest people and inspire them to develop applications using the data
As I’ve personally argued before, particularly in response to Tom Coates’ recent call for “Death to the Seamntic Web”, I’m entirely agnostic about formats myself. What I think is most important is that we provide consistent, RESTful, predictable, persistent hooks into Guardian.co.uk content, in as many ways as possible, with the right licence for re-use.
What next?We are now evaluating where else we can add value to our API with joins to external datasets. Again we will aim to be pragmatic—tagging the most amount of data with the least amount of effort. And we also want to listen to the linked data community—what are the data joins that would be most useful to external developers?
Martin Belam is an information architect at the Guardian newspaper.
-
10:29 Linked Data – Coming Together
» Nodalities
To quote John ‘Hannibal’ Smith, from that wonderful bit of 1980s TV, “I love it when a plan comes together!”. Of course aficionados of the A-Team will probably remember ‘the plan’ was often only apparent in retrospect, although it’s general intention was clear from the start.The adoption of Linked Data and the realisation of all that potential benefit, is looking a bit like an A-Team episode – the eventual outcome being clear from the start, but with many setbacks, skirmishes to fight, partners to woo, nerves to calm, and teams to lead on the way.
To break the metaphor at this point, I see Linked Data as more of a shared vision than a plan laid out before us. Nevertheless, I think we are staring to see elements of it ‘starting to come together’.
One very obvious example, is what Ordnance Survey is doing by continuing to open up their location data. Now that OS have defined a URI for every UK postcode unit [eg. ‘SO16 4GU’ = [data.ordnancesurvey.co.uk] ], why would anyone [re-]publishing data in the future not use these identifiers to reference their postcode information? By that simple step they will be linked in with a wealth of ancillary information about the location – easting/northing, ward, district, county, country, etc.
Great I hear you say, but show me an example of what that could lead to! Being lazy, I’ll let the inimitable John Goodwin of the OS do it for me. In his recent appropriately named “So what can I do with the new Ordnance Survey Linked Data?” post, he shows how by merging data from a previous Talis project, produced for the Department of Innovation and Skills, he can deliver a very different way of accessing the same data. The BIS Research Funding Explorer project brought together data about UK Government research funding, from several research councils and the Intellectual Property Office, and brought them together in a Linked Data driven application to display UK centres of research excellence.
John explains how by mixing Linked Data, published for that project, with OS Linked Data, he has been able to develop a different way of accessing the data. In his, prototype, application you are presented with a map of the UK showing the regions as defined by the European Union. By clicking on one of the EU regions you are presented with a list of the projects from within that area. He has also added the ability to access by county or District/Unitary Authority. A simple, but effective, way of demonstrating that data, in Linked Data form, from one source can be easily combined with data from another source to deliver benefit.
Of course even with this example we are seeing the effect of joining just a couple of jigsaw pieces together. With Linked Data, such as this from OS, being published at an ever increasing rate, it will not be long before a bigger picture starts to form as more and more data pieces are linked together.
I love it when you can see a plan coming together!
-
11:21 LOD Around-the-Clock (LATC)
» NodalitiesGuest post by Lin Clark and Michael Hausenblas, DERI
In this, the Petabyte Age, technologists have a growing obsession with data—Big data. But data isn’t just the province of trained specialists anymore. Data is changing the way scientists research and the way that journalists investigate; the way government officials report their progress and the way citizens participate in their own governance.The challenge that all of these accidental technologists face is how to surface data and bring data together in meaningful ways. As Google’s chief economist Hal Varian has said, the scarce factor is no longer the data, which is essentially free and ubiquitous, but now the “scarce factor is the ability to understand that data and extract value from it.”
The emerging Web of Linked Data is the largest source of this data—multi-domain, real-world and real-time data—that currently exists. As data integration and information quality assessment increasingly depends on the availability of large amounts of real-world data, these new technologists are going to need to find ways to connect to the Linked Open Data (LOD) cloud.
With the explosive growth of the LOD cloud, which has doubled in size every 10 months since 2007, utilising this global data space in a real-world setup has proved challenging; the amount and quality of the links between LOD sources remains sparse and there is not a well-documented and cohesive set of tools that enables individuals and organisations to easily produce and consume Linked Data.
A new project aims to change this, making it easier to connect to the LOD cloud by offering support to data owners, Web developers who build applications with Linked Data, and small and medium enterprises that want to benefit from the lightweight data integration possibilities of Linked Data.
LATC to the RescueThe new LOD Around-the-Clock (LATC) project kicked off on September 13-14, 2010 at the Digital Enterprise Research Institute in Galway, Ireland. LATC brings together a team of Linked Data researchers and practitioners from DERI (National University of Ireland Galway), Vrije Universiteit Amsterdam, Freie Universität Berlin, Institut für Angewandte Informatik, and Talis.
This team will support the production and consumption of Linked Data by providing:
- A recommended tools library for publishing and consuming Linked Data, supplementing documentation for the tools, and free implementation support for large-scale data publishers and consumers. Tools include the D2R Server for publishing relational databases on the Semantic Web, the Drupal CMS and related publishing and consupmtion tools, and others.
- A 24/7 interlinking platform (see Fig. 1) that acquires new data and creates links between existing datasets in the LOD cloud.
- Publication of new large-scale LOD datasets with data from governmental departments and other organizations. The focus will be on EU level datasets such as CORDIS, the European Patent Office, and Eurostat.
Homepage:
[latc-project.eu]
Twitter: @latcproject
Duration:
09/2010- 08/2012
Total cost: 1.19 M€
EU contribution: 1.06 M€
Further information:
Dr. Michael Hausenblas
IDA Business Park, Galway, Ireland
Tel. +353 91 495730
michael.hausenblas@deri.orgIn addition to the core team, a large Advisory Committee with more than 30 members will participate in the LATC activities and connect the Linked Data community to LATC’s recommended tools library and support services. Organizations on the Advisory Committee are entitled to support from the project and thus will be in a position to give feedback to improve the support services. The Advisory Committee includes governmental organisations such as the UK Office of Public Sector Information and the European Environment Agency; researchers and practitioners such as the University of Manchester, University of Economics Prague, Vulcan Inc., CTIC Technological Center, the Open Knowledge Foundation; and standardisation bodies, including W3C (Tim Berners-Lee). The LATC partners will also liaison with other EC projects and related activities, including LOD2, PlanetData, SEALS, datalift.org, Semic.EU, OKFN, and the Pedantic Web group.
LATC organises and supports a number of community events, including tutorials at the International Semantic Web Conference 2010 in Shanghai, China, as well as the Open Government Data Camp, London.
LATC is a Support Action funded under the European Commission FP7 ICT Work Programme, within the Intelligent Information Management objective (ICT-2009.4.3).
-
11:43 Dion Hinchcliffe – Web and Social in the Enterprise
» Nodalities
The opening conference keynote presentation this year comes from Dion Hinchcliffe, Senior Vice President of Dachis Group. Dion is an internationally recognized business strategist and enterprise architect with an extensive track record of building enterprise solutions and strategies for clients in the Fortune 500, federal government, and Internet start-up community.In this conversation we explore the impact of web and social technologies and their impact, challenge, and opportunity when applied to the enterprise.
-
9:17 Focus on Local Government Spending
» NodalitiesThe UK Government Transparency agenda is encouraging Local Government as well as National Government to publish its data as Open Data and Linked Data, reflecting the world leading progress that data.gov.uk has made on these fronts over the last year and a bit.
I am sat in the opening session of Socitm 2010 conference, in sunny Brighton, whilst writing this. Already it is clear that local government spending is a major issue for the sector. In it’s broad sense, of how much local authorities can [or cannot] spend
, it is providing the background for the whole conference. Not doom and glom here though. IT could be seen as a knight in shining armour to help the public sector deliver better services what the encouraging thought proffered by Louisa Preston as she launched the day. In its more narrow sense, the requirement to publish data about all local government spending items over £500 from January 2011 onwards, it gives a focused example of the opportunity for a significant change in thinking and practice by the sector.
As Nodalities readers are well aware, Linked Data tools, techniques, and technologies have massive potential to simplify the publishing, linking, aggregating, and making data work across a web of data. It is no coincidence that data.gov.uk is making steady valuable progress publishing key data sets in linked data form in the Talis Platform – it is an obvious step. For many in local government, linked data is something they have never met before. For them the, traditionally unnatural, step of openly publishing what in the past would have been a private report out of the back of their finance system, is a significant step in itself.
It is the responsibility of those of us, who understand the benefits of taking the extra step beyond just publishing a simple csv file to publish in Linked data form, to make it easy for all authorities to understand and take the combined step of publishing Linked Data from the start.
To that end, we at Talis recently announced a free stores offer for all UK local authorities to publish their spending data as Linked Data.
Traditionally our approach would be host a free open day to help those in local government understand Linked Data and the benefits to them. Recognising the broader economic climate, and its influence on local government spending in that broader sense, that doesn’t seem to be a good idea.
Many organisations, not least Socitm (there is a Linked Data session at the conference today) and the Local Government Group, in the sector are looking to promote this approach. We are therefore going to work with the sector to promote this message. To that end we are to participate in the Open Data strand of the free Local by Social online conference, 3 – 9 November being hosted by LGID.
As well as checking out, what looks to be a quality online event, stay tuned to the Talis initiatives in this area.
Technorati Tags: Linked Data,Open Data,LGID,localgov,opengov
-
15:33 Talis Inc
» Nodalities
Having moved over to the UK from the States quite a few years ago now, one of the things I noticed about company names was that they tend to use “LTD,” and for reasons unknown, I somehow always thought Talis Inc sounded better than Talis LTD.
Well, I’m very happy to be able to say that Talis Group LTD, will now have a new subsidiary with the excellent name: Talis Inc. The Inc means, of course, that we’ll have a new member of the Talis Group bringing our Platform, managed services and expertise to the United States.
Based in Virginia, Talis Inc will be ably lead by Bernadette Hyland, the new CEO of Talis Inc. She will be joined and supported by David Wood as VP Engineering. Together, Bernadette and David bring to Talis a huge amount of Semantic Web experience and a remarkable reputation: both entrepreneurs were founders of Tucana—one of the first commercial triple store vendors—and were most recently at the Semantic Web consultancy Zepheira.
Alongside a new subsidiary comes Talis’ first US customer: the US Government Printing Office (GPO). Talis will be running the GPO’s PURL infrastructure, which provides provides persistent Web addresses for critical government documents and is primarily used by the more than 1,200 Federal Depository Libraries. The PURL server uses the PURLz open source software, the development of which was led by David while at Zepheira, and complements the data hosting and search capabilities of the Talis Platform with identifier management functionality.
So, please join me in welcoming a stellar entrepreneurial team, our first US customer, and the addition of an Inc to the Talis family!
-
11:41 edit: under development. LOD cloud and Talis’ Datasets…
» NodalitiesLast week, Richard Cyganiak and Anja Jentzsch launched their latest version of the Linking Open Data cloud diagram. You will have seen this diagram, I’m sure, in its various iterations over the years. From the cover of early Nodalities Magazines to the slides of most any Linked Data presentation you care to recall. Richard and his team have done a fantastic job of creating a useful picture of the Linked Data cloud, and its evolution from a few circles and sticks to the complex and massive diagram you can see on Richard’s site.
Richard humorously tweeted the day it was launched: “Did you hear that? The sound of a hundred linked data advocates updating their slides
;” and he can’t have been far off. Also fortunately, Richard and Anja have made the LOD cloud available under a CC By-SA license, meaning that not only can Linked Data folk pinch a copy of the LOD cloud for their slides, but can update and modify it too.My colleague Rob Styles put together a coloured version of the LOD cloud with a bit of a Talisian twist. Below is the current (as of Sept 2010) LOD map highlighting datasets Talis has been involved with. So, with each of the coloured circles, we’ve created the RDF itself, hosted a Linked Data version of an existing dataset, helped to model the data or provide for it data access tools (like a SPARQL endpoint). It’s very exciting to see, and also surprising seeing this picture in such clear context!
Edit:
It looks like the version I posted earlier was a draft, and the next version will be along shortly.I should clarify that we are just highlighting where we have helped the Linking Open Data project by offering support, expertise and hosting. The LOD cloud is the collective effort of dozens of organisations and individuals who have worked tirelessly to promote the project. We are proud to be part of such an exciting and growing
community.We’ll put up a new version when it’s been developed a bit further.
-
10:57 Public-sector Pay and Panorama…
» Nodalities
A couple weeks ago, the BBC asked us to load a set of data into the Talis Platform to support an upcoming episode of Panorama. The episode, airing tonight at 8:30pm BST, will cover public-sector pay. They’re looking particularly into the topic of the highest-paid public sector jobs, especially the jobs of senior civil servants paid more than the UK Prime Minister.The episode, which aired last night at 8:30pm, covered public-sector pay. It looked particularly into the topic of the highest-paid public sector jobs, especially the jobs of senior civil servants paid more than the UK Prime Minister.
So, we modelled the data the BBC supplied, converted it into Linked Data and loaded the lot into the Talis Platform. The BBC’s is pulling data from the their Platform stores to power the Panorama exploration tool, which you can use here.
The exploration tool gives you an interactive view of where top public-sector salaries are going, sorting by sector and giving you a facetted picture. So, you can have a quick glance at the top 10 positions in Local Government, then filter down to find those of Wales, or even deeper and have a look at the district councils of, say, the Northwest of England.
The explorer is making use of the Linked Data API—the same thing that works with data.gov.uk—giving their developers the data formats such as JSON which are used in the application. So, whenever you click your way through the explorer, you are pulling at the end of an interesting string of data-driven wheels and cogs; the end of which is all linked up and SPARQLy.
The BBC have taken Linked Data very seriously, and it’s even something that’s influenced the way they’re thinking about information architecture more widely. They’ve built much of the framework behind projects like the Wildlife Finder and their World Cup site on Linked Data principles. For a peak at this world, a great place to start would be Silver Oliver’s recent post about the Semantic Web. And for more about the way this story unfolds, watch last night’s Panorama on BBC iPlayer if you’re in the UK.
-
17:53 Talis Training: Intro to the Web of Data
» NodalitiesIntro to the Web of Data 21-22 September 26-27 OctoberSo, we’ve been running a series of Open Days which you can’t have failed to notice here on the Nodalities blog. We’ve covered very broad topics related to the Semantic Web and Linked Data, giving an overview of graph-thinking with data, URI’s and some direction.
But the question keeps coming up: “How does my team actually use Linked Data?”
We’ve done quite a bit of training, both bespoke consulting and as a set course, and you can read a bit more about that over on our consulting page. We’re now hosting a series of open-registration training courses: A 2-day introduction to the Web of Data.
The course provides an in-depth introduction to all of the core technologies that a developer will encounter when working with and publishing Linked Data. It includes a thorough introduction to the RDF model; modelling of data using RDF Schema; publishing of data to the web as Linked Data, and querying RDF datasets using SPARQL.
We’re offering a discounted early-bird price for the first two courses of £1,000 per attendee (ex VAT) if booked before 1st October. We’ll be putting on lunch and our now-famous SPARQL blend coffee from Union Hand-Roasted Coffee, too! Places booked after 1st October cost £1,200.
The first course will be on 21 and 22 September at No 76 Portland Place, London. The second will be at our offices in Birmingham on 26 and 27 October.
-
8:18 Linked Open Data and Pavlova
» Nodalities
If Sir Tim Berners-Lee can equate Linked Data with a packet of crisps/potato chips, I thought I would take a stab at another food metaphor for this post. Linked Open Data (LOD) is a concept that many believe they understand. Take yourself to most any conference that has a connection with data, or the web, or the Internet at the moment, and it will not belong before you see a slide of the Linked Open Data cloud diagram, or of Sir Tim imploring us to give him our raw data now, or if you are very lucky a
shot of him doing his imploring whilst stood in front of a shot of the LOD cloud. - Simple really, just publish your data as Linked Open Data and all will be wonderful as we move towards the sunlit Semantic Web uplands. Unfortunately life is never that simple – LOD is not a single identifiable thing. As Paul Walk eloquently puts it: - data can be open, while not being linked
- data can be linked, while not being open
- data which is both open and linked is increasingly viable
- the Semantic Web can only function with data which is both open and linked
As with any recipe for success, the majority concentrate on the final result. Praising or criticising it as a whole, without identifying the benefits or otherwise, of the individual ingredients.
Take a strawberry pavlova for instance. If you you are in to that kind of thing, a delightful culmination of the culinary arts designed to send your taste buds in to raptures. Unless that is, you don’t like cream, or you don’t like strawberries, or can’t abide meringue, in which case the whole thing seems a little pointless. What has this got to do with Linked Open Data (LOD), I hear you ask. Well, I am increasingly seeing LOD being presented as the goal for those wishing to publish their data on line. My position is that the eventual goal, from which will spring a Semantic Web, is a global web of linked and open data. However, there are many steps from where we are now to achieving that goal. Within audiences that I present to, and/or sit amongst, I see people who for whatever reasons do not ‘get’ one or more of the components of LOD – they cannot envisage opening up any of their data, or think that using a web address for an identifier is over complex, or have a religious aversion to RDF. As a result they dismiss the whole recipe as not for them, or worse still, as something impractical that will become nothing more than the plaything of a few passionate enthusiasts.
When someone who is still struggling with the concept of opening up their organisation’s data; or why RDF might be a more useful format than csv, is shown the ubiquitous Linked Open Data cloud diagram with encouragement to join in – it is hardly surprising they remain a little unconvinced. This isn’t a criticism of presenters either. In only 20 minutes on a stage, it is difficult to go into underlying detail.
Let my try in a few paragraphs to break the LOD pavlova in to it’s ingredients
- Data – In the context of this post, by data I mean machine readable information, produced in a format that can be consumed and processed by other machines. Inevitably, this means file formats such as csv, XML, RDF, etc. , but not something like pdf, html, or word, which although they are in a transferrable format it is designed for human consumption not machine analysis.
For some, just this step from their current human targeted format, to a machine readable one, is a significant one.
- Open Data – Data (see above) which is accessible for all to download, view, and consume in a way that is not encumbered by licensing that restricts its use. For example, the licensing used by data.gov.uk data. By definition data which is restricted for certain uses is not fully open.
In our internet based world, openness can also be defined in terms of technical accessibility. If it is only available after a login process, or it is only available to users behind a firewall, it couldn’t be considered as open.
- Linked Data – Data (see above) which contains URIs as identifiers for concepts described in the data and URIs to identify the relationships between those concepts. The four Linked Data Principles, as published as a design note by Tim Berners-Lee, provide a bit more detail on this.
I am in danger of stirring the embers of a religious fire fight here, between those that believe that Linked Data must be described in RDF and contain URIs as identifiers, and those that maintain that you can have data linked across the web without those constraints. All I am going to say on that at this time, is that the Linked Open Data cloud of data sets has been successful, based on the first of those two views. (if you want to follow that particular debate in more detail, Paul Miller’s post and associated comments would be a good starting point)
So, how can data be open, but not linked? – by publishing in in a non-Linked Data form such as a text file or a html page or a pdf. Where would you find this? – all over the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to consume.
How can data be Linked but not open? – by publishing it in accordance with the principles, in RDF, with URIs, but restricting access either by imposing restrictive licensing conditions or restricting access to the data. Where would you find this? – again all over the web, but often hiding behind restrictive licensing terms such as “non-commercial use only”. Also to be found inside organisational firewalls. For example, commercial organisations can realise the benefits of using Linked Data techniques with their internal private data. Potentially linking it to publicly visible concepts across the web to add even more value for their employees.
Data that is Linked and Open, like that strawberry pavlova, has the power to deliver value beyond the sum of its individual ingredients. By providing data in a form that is linked to other data, and easy for others to link to, without restrictions on who or how that linking takes place, provides the foundation for a web of linked data built on the same principles that fostered the growth of the web of documents that has so changed our world over the last decade and a half.
The ingredients that formed that World Wide Web of documents – html, [http,] open publishing of web sites without restrictions on other’s abilities to consume and/or link to them – individually were important developments. However, when those elements were blended together their effects were multiplied many fold and resulted in the web we experience today.
So [as I stretch my culinary metaphor to it’s limits] if you are hoping to take people with you in building a Linked Open Data future, you not only have to show them a picture of the final dish, you need to describe the individual ingredients and their relevance to the eventual result.
Pictures from Flickr by PhOtOnQuAnTiQuE and avixyz
Technorati Tags: Linked Data,linkeddata,Open Data,opendata,Semantic Web -
13:53 Best Buy: Semantic Web and Retail
» Nodalities
In this Nodalities Podcast, I speak with Jay Myers from Best Buy about how he and his team are working within the retail giant to better harness their data. Jay tells us about his use of blogs and RDFa to better manage “open-box” products returned to Best Buy’s many stores in an effort to surface deals to the public and make savings on otherwise costly problems. Jay also explains how Best Buy are publishing the machine-readable data out on the public web and touches on the next steps Best Buy will be taking. He also calls on the Semantic Web community to take an active role in promoting work like this by voting for his panel at South by Southwest, which you can see here.
Jay Myers is a Lead Web Development Engineer for Best Buy, and is an active supporter of the GoodRelations vocabulary for ecommerce, utilizing it for modeling consumer products, stores, and services in both RDF/XML and RDFa. For more information, you can read his blog or catch him on Twitter.
-
7:35 A conversation about The Interactive Knowledge Stack
» Nodalities
My guests on this Talking with Talis podcast are Wernher Behrendt and John Pereira of Salzburg Research. They are part of the team behind IKS – The Interactive Knowledge Stack an Integrating Project part-funded by the European Commission.The four year project started in January 2009 to provide an open source technology platform for semantically enhanced content management systems. The concept behind it being, that once developed, the stack can be bolted-on to many different CMS products to add semantic, and semantic web, capabilities. Even though the project is open source, and the obvious use of it is with open source CMS tools, it’s use could be of equal value to commercial products.
Their target is engage with 40 small to medium organisations for whom developing such capability would not be possible with their limited resources. They are already well on the way, with many joining in via the project Web site and participating at the first early adopters workshop in Salzburg in June.
Technorati Tags: IKS,Semantic Web,CMS -
17:12 Linked Data and Health: Speakers
» NodalitiesWe’ve had an overwhelming response to our Linked Data and Health open day, which will be running this Thursday in London—there are no places left!
As a quick intro to the day, I’ll quickly post a bit of information about some of our guest speakers here, with a working title for their talks. (Please note, that the titles may change).
Alongside our guest speakers, several of us from Talis will be talking about the wider world of Linked Data, giving an overview, demos of LD applications in use, and doing our best to answer the seemingly simple question: “Why Linked Data for health?”
Dr. Nigam ShahDr. Shah’s research is focused on developing applications of bio-ontologies, specifically building ontology-based applications in the biomedical sciences and using Semantic Web technologies to improve search and integration of biomedical information. He teaches at Stanford on topics of how to make and use biomedical ontologies, current trends & future directions in biomedical ontologies and reasoning with biomedical data. He has co-chaired the Bio-Ontologies meeting at the ISMB conference since 2007.
Dr. Shah’s talk is: Opportunities for applying semantic technologies to health care data.
Dr. Michael WilkinsonDr Michael Wilkinson is the Business Development Manager for the NHS National Innovation Centre (NIC). Michael is currently leading on a programme of work to create a linked data platform to speed development of technological innovations likely to benefit the NHS. The NIC works across sectors and encourages collaboration between innovators from industry, academia, and NHS clinicians, scientists, and procurement officials. The NIC also works with other government departments and the EU to improve efficiency of innovation procurement. Prior to joining the NIC, Michael was an academic at the London School of Hygiene and Tropical Medicine. He has also held appointments at the Cabinet Office, Nesta, and hospitals in the USA.
Mark BirbeckFor a number of years Mark Birbeck has been involved in helping to bring about the Semantic Web, and has consulted, written and spoken widely on this and related topics. He is the originator of the W3C’s RDFa standard, and most recently he has been working on a number of semantic web projects for the UK government.
Mark will be speaking with Dr. Wilkinson, introducing the NHS clinical widget platform, which they jointly wrote about in Nodalities Magazine (pdf).
Dr. Jun ZhaoDr Jun Zhao is an EPSRC Postdoctoral Fellow from the Department of Zoology at the University of Oxford. She has computer science research background in various domains, including e-Science, provenance, Semantic Web and biological data integration. She has more than six years’ experience of applying Semantic Web research and technologies to bioinformatics and biological information representation and integration. Currently she is running her fellowship project, Open-BioMed, which investigates the use of Web of Data for publishing and integrating biomedical data resources and the role of provenance information for evaluating their trustworthiness. She is actively involved in both the W3C Health Care Life Science Interest Group and the W3C Provenance Incubator group.
Dr. Zhao’s talk will be: Linked Data for Biomedical Science: A Tale of Two Success Stories
Leigh DoddsLeigh has significant experience of working with Semantic Web and Web technologies as both an independent hacker, researcher, as well as in production environments in a number of roles including developer, software architect and product manager. He has written about, and spoken widely on a range of semantic web topics include SPARQL, Linked Data, managing and aggregating data on the web, semantic web application development, and data licensing and management. Leigh is currently employed by Talis as the Programme Manager for the Talis
Platform and is responsible for both product strategy and business development.Leigh’s talk is: Why Linked Data for Health?
-
14:00 Wikileaks and the Guardian
» Nodalities
I spoke with the Guardian’s Simon Rogers, editor of the Data Blog, about their decision to publish thousands of facts from the Wikileaks Afghan War Diary. In this podcast, Simon introduces Wikileaks and its use by journalists, an reiterates the Guardian’s strategy of publishing raw data alongside stories and comment. During the conversation, Simon explained his perspective on publishing these leaked data and what people can do with it, pointing out that the Guardian doesn’t put any restrictions on reuse of the facts.One of the major applications of these raw data, especially anything containing geographical information, is the ability to visualise them. One of the first things the Guardian produced from the leaked data was an interactive map of Improvised Explosive Device incedents affecting troops and civilians.
The opening up of the data behind such applications could prove to be a powerful catalyst for wider visualisation and applications built around the presence of authoritative journalistic facts. Putting the raw data in the hands of the web’s hackers has been a bold move from the Guardian, and I hope to see new and better stories come from the tools made possible by a supply of useful information.
-
15:28 Linked Data and Libraries – almost like being there
» NodalitiesThe room was almost full at the British Library Conference Centre for the Linked Data and Libraries event on 21st July 2010, and many who wanted to attend couldn’t because of distance, other commitments, etc.
We therefore took along our brand new screen grabber device and a video camera to capture as much of the day as we could. We have completed the editing process so I am ready to share the videos for those that want to view, or remind themselves of, the day.
Like most of the content we produce at Talis, these videos are licensed under a Creative Commons Attribution License, so share and enjoy.
Technorati Tags: Linked Data,Libraries,Talis -
15:53 Linked Data in Libraries – Presentations
» NodalitiesThe Talis Linked Data in Libraries event, held at the British Library in London on Wednesday 21st July was attended by 50 enthusiastic interested people interested in the topic.
Below you will find presentations from the day.
Introduction Talis and the world of Linked Data – Zach Beavais, Talis
Click to playThe data.bnf.fr Project – Romain Wenz, Bibliothèque nationale de France
(Presentation not yet available)Linked Data, RDF, and SPARQL – Rob Styles, Talis
Click to playLinked Data in Action – Richard Wallis, Talis
Click to playLightning Talks:
Neil Wilson, The British Library
Sally Chambers, The European Library
Felix Ostrowsk, The North Rhine-Westphalian Library Service
Linked Bibliographic Data – Rob Styles, Talis
Click to playW3C Library Linked Data Incubator Group – Antoine Isaac, Europeana
Click to playAn overview of the Talis Platform – Richard Wallis, Talis
Click to playWatch this space for videos of some of the sessions.
Technorati Tags: Linked Data,Semantic Web,RDF,Libraries,Talis,ldal -
18:24 Open Day: Linked Data and Health
» Nodalities
We’ve seen and reported on the rise of Linked Data from concept to practice, and our Open Days have been a great opportunity to explore and explain Linked Data very broadly. The broad discussions have allowed many people to imagine using semantics with their own data, as publishers, developers, information architects etc. across many different industries and applications. But one area in which we are particularly interested is health. Biomedical science is full of structured and semi-structured information, much of which crosses the organising boundaries we’ve created for it. Every aspect of medical practice, research and policy makes use of (and in most cases creates supplementary) information, and it’s become plain that much of this data is stored, hidden and often unaccessible.
I attended some sessions on biomedical semantics at SemTech last month, and was hugely intrigued by the state of health data world-wide. There are many usable ontologies for medical science, for example, which show the relationships among biological knowledge and clinical use; but much of the data used on the front line is not part of this structure. There seems to be much that could be gained from taking a Linked approach to these data!Mark Birbeck and Dr Michael Wilkinson, in last month’s Nodalities Magazine introduced the idea of “A Linked Data Platform for Innovation,” a project of the National Innovation Centre for joining clinicians to linked visualisations through a widget-like, Linked Data platform:
The NIC is committed to using Semantic Web technologies as a way to significantly improve the speed and quality of decision- making in the area of health technology innovations.
So, we’ve decided to join forces with some of these minds and host an event to explain and explore biomedical data. We’ll be at No 76 Portland Place on 19th August from 10AM to 4PM. We’ve invited Dr Nigam Shah from Stanford University to talk to us about the state of global health data, and to suggest several ways in which linking can be done in the very near future. We will also cover the topic of Linked Data (what it is, and how it works), as well as taking a quick look at how it’s being used across the web already. The people behind the NIC’s clinical widget platform will also be there to introduce their project.
Places are free of charge, but limited so make sure to sign up to reserve your place.
We’d very much like to keep the spirit of an Open Day. This event is open for discussion, examination and exploration of using the Semantic Web in life sciences, so come armed with ideas, questions and problems!
Talis will be putting on lunch, and we will also have a ready supply of coffee on hand to help the discussions.
Image: “Science is Knowledge” by Zach Beauvais, is a mashup of “3D Stone Cells” by BlueRidgeKitties, and “Glass Bottles I” by Tim O’Brien via flickr. They are used under CC: BY, NC, SA licenses.
-
11:13 Tom Steinberg talks about the Public Sector Transparency Board
» Nodalities
Tom Steinberg of mySociety fame joins me on this Talking with Talis podcast to discus the approach to open and linked data in the context of the UK Government.We talk about his role over the years; the emergence data.gov.uk as part of the previous administration’s Making Public Data Public initiative; and the subtle change of emphasis accompanying the new administrations name change to the Transparency Programme.
Finally we move on to the role of the newly formed Public Sector Transparency Board of which he is a member.
Technorati Tags: Tom Steinberg,Government Data,Open Data Commons,Linked Data,data.gov.uk,Talking with Talis -
22:25 One Step at a Time
» Nodalities
I expected some comments to my Data Publishing Three-Step post last week but what I didn’t expect was a virtual pat on the head with an accompanying croon of "Who’s the clever boy, then? You are! Yes, you are!" in a reply post—I’d love to dance with you, but…— from Dorothea Salo. The problem she identifies in her, politely phrased, complaint about my reductionist approach is this:
Aside from my friends the open scientists (and not even all of them, to be honest), practically all the data-producing researchers I know are firmly stuck on Step 1. Firmly stuck, not to say "immovably." As for Step 2… trust me, these folks are not data modellers. I sincerely doubt my own capacity to teach RDF to someone who approaches me asking, "Is it okay if I record my data in Excel?"
And I totally agree with her. It would be great in an ideal world if data creators could take all three steps to publish in a linkable queryable form. But as she identifies, many folk are not data modellers, and wouldn’t want to be. The three steps I identified are there for motivated people to take as many, or as few, that are compatible with their work and motivations. All anyone could ask is that they at least have an awareness that others may have sufficient interest and motivation to take their data through the next step.
Starting with getting your data out there, in any form (yes even Excel, if that is your tool of choice), is the foundation. Without the data in a form that you, and others, could reuse, there is little point.
So expanding on my Step 1. description:
- Publish your data.
- Publish it in a way that others can use – in a known format from which you can easily extract the actual data elements (Excel, csv, etc. not pdf, or word).
- Publish it with an explanation of what the data is, and where to get it.
- Publish it under simple unambiguous licensing terms, without ambiguous restrictions such as ‘non-commercial only’.
- If possible identify things in the data using well known identifiers – not ’substance_1234′ where H₂SO₄ could be used, or location_abc where Paris, FR would do.
- If there is not a suitable well known identifier set, create your own but publish that as well.
- Be consistent, with yourself and others around you – don’t go reinventing wheels.
- Publish your data!
Whilst taking Dorothea’s point about the difficulty in just convincing some people of the merits in exposing their data, none of this is rocket science and doesn’t mention Linked Data, or offends her longtime RDF scepticism.
Get significant amounts of data out there, and hopefully others will be motivated enough to use it usefully to add value by linking it to other data – maybe that will help demonstrate the worth of steps 2 and 3.
Picture published on Flickr by paraflyer.
Technorati Tags: linkeddata,Linked Data,Opendata,RDF
-
22:33 Facebook: David Recordon talks with talis about the Social Graph
» Nodalities
We’ve covered the launch of Facebook’s Open Graph protocol in Nodalities Magazine, discussing its potential impact on Linked Data. So, I invited David Recordon—Facebook’s Senior Open Programs Manager—to talk with Talis about Facebook and the Open Graph Protocol. We ended up talking all about the protocol, how developers can make use of it (and why), as well as touching on Facebook’s view of social networking as a graph.The Open Graph Protocol page has information about the protocol itself. Facebook’s f8 developers’ conference site also has links with more information for developers.
-
11:00 The Data Publishing Three-Step
» Nodalities
In a conversation with data owners about how they should be publishing their data, it is usually not long before the following question turns up: “So, what do I actually have to do to publish my data?” Often the conversation then wanders off into a game of buzzword bingo–RDF, RDFa, SPARQL, dereferenceable URIs, triples, content negotiation, open data, Linked Data, end-points, etc.—to be followed by a blank look and the unuttered question "Yes, but what do I actually have to do to publish my data?” In an attempt to simplify the answer to that oft unuttered question, I break things down in to three steps.
Step 1 Get your Data Out – for others to consume
Sounds simple. Just take the spreadsheet (or similar file) that you use to track information, post it on your web site and link to it from a description posted in an accompanying web page. It can be that simple, but there are things to consider:- Licensing – will potential consumers of the data be confident on their ability to use and/or reuse it. (The UK Government are very clear on this)
- Is it open but opaque? – The terms, codes, identifiers etc. you use may be meaningless, or worse still ambiguous, to those outside your organisation, or even your department.
- Could your data be made more consistent with other data you, or similar organisations, already publish.
All things to be considered, but not to be put up as excuses for not publishing.
Step 2 Get your Data In – to an open linkable standard format
This is the most powerful step, which consists of identifying the elements in your data (organisations, locations, things, projects, types, etc.) and giving them unique identifiers then make these identifiers web links. Fortunately this may not be as onerous as it sounds. There are many publicly visible/usable identifiers that you can use for your data – for example:For this step to be effective you really need to be modelling your data. Your [first class] data elements, and the relationships between them. Plus possibly relationships with external entities. The output of this step will be an RDF representation of your data to Linked Data Principles. You should also identify the process or rules to get from your source data in to this new form, enabling you to repeat for later versions of your data.
Having said all that, it is not necessarily only you that will/can do step 2. It is perfectly possible for a third party, or a central organisation such as data.gov.uk, or even an enthusiast, to carry out this data modelling and transformation step with data that you have openly published.
Next you need to publish your data so that it can become part of the Web of Linked Data, which brings me, with apologies to fans of the traditional party song, to…..
Step 3 Link it all about
Going through step 2 and not making your data available, or providing useful information at the end of the links you embed in your data, would be a bit of a pointless exercise. How to publish this data is the next question, to which there are at least three equally valid answers.- Using an encoding technique called RDFa, you can embed the RDF data within the html coding of a web page so that software can obtain a more structured representation from a web page than a human, viewing it in a browser would.
- You could just publish the RDF in rdf files on your web server. A good example of this is the way the BBC publish the RDF for many of their pages, such as for their Wild Life. The Lion Web page – the RDF for Lion (dependant on your browser, you may need to use it’s view page source option to see the actual RDF encoded in XML)
- You could store the individual RDF statement (triples) in a triple store, or SPARQL end-point. This not only publishes the RDF, but also enables the data and relationships within the data to be queried. This">[http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+ed%3A+%3Chttp%3A%2F%2Feducation.data.gov.uk%2Fdef%2Fschool%2F%3E%0D%0Aselect+%3Fname+%3Fid%0D%0AWHERE+%7B%0D%0A+++%3Fid+a+ed%3ASchool%3B%0D%0A+++++++++ed%3AestablishmentName+%3Fname+.%0D%0A%7D%0D%0ALIMIT+10&datasource=2&dataformat=XML">This] is how data.gov.uk publishes RDF, from Talis Platform Stores. This interface might look a bit cryptic – the results, formatted in XML in the top box, from running the SPARQL query shown in the bottom box – but this is a developers interface demonstrating the code and results an application might use, so you wouldn’t expect much different.
I’ve decided to go through these steps, can you remind me again why? - So that your data can be linked with other data to add value to the experience of consumers of your data and services, as well as others using your data to add value elsewhere. A good example of this in action being the BIS Research Funding Explorer.
Technorati Tags: linkeddata,Linked Data,opendata,Open Data Commons,RDF,SPARQL,Talis,Talis Platform -
15:02 Some Clarity on Transparency
» Nodalities
Since the Conservative Liberal coalition replaced the Labour Party, as the UK Government party in power, there has been a question about how the Conservative’s approach to opening up public data would change the "Making Public Data Public initiative", and its influence upon data.gov.uk. Advisers, Sir Tim Berners-Lee and Professor Nigel Shadbolt, were retained by the incoming administration, who also introduced Tom Steinberg, founder of mySociety, as their man in this area. They also made some pronouncements about using open standards and openly publishing data, but there was not much initial detail behind this.
Last week saw the first meeting of the Public Sector Transparency Board, chaired by Francis Maude, the Minister for the Cabinet office. He was joined by these three advisers and Doctor Rufus Pollock, from Cambridge University, at the first meeting.
Their first task was to discuss new public data transparency principles, which have been reproduced in a post on the data.gov.uk blog. These eleven draft public data principles go a long way to reflect the thinking of this group and how they intend to take forward the initiatives of their predecessors.
Key points that attracted my attention include:
- Release data quickly and then republished in linked data form later on–getting the data out there being the most important step in this process, formats being a secondary consideration.
- Public data will be available and easy to find through a single easy-to-use online access point–this access point being data.gov.uk.
- Data will be released under open licenses and in machine readable form, following World Wide Web Consortium recommendations and standards—linked data.
So, on the surface things don’t look that much different to what they did before the government changed—the commitment to publishing data in any format that was useful initially, and then a commitment to move towards making it machine-readable and linkable.
There does seem to be a drive to go further and deeper than their predecessors. Both from the point of view of publishing financial data, and anecdotal evidence of government departments being asked to discover all datasets that they, have that have not yet been published–a bit of a Donald Rumsfeld situation methinks.
Any concerns that changing the name of the initiative from Making Public Data Public, to the Transparency Agenda, would affect the progress of these initiatives seem, from these early draft principles, to be unfounded. From my point of view good for open data, good for Linked Data, good for data.gov.uk, good for UK government, and good for all of us.
Picture from Flickr by liber
Technorati Tags: Open Data,Linked Data,Goverenment Data -
0:55 SemTech quick notes
» NodalitiesI’m here in San Francisco at the Semantic Technology conference with a cohort of Talisians and a bunch of the world’s Semantic Web companies and thinkers. I’ll pull together some various posts from the things happening here, but if you wanted to follow my more raw and unpolished notes, mostly from sessions I’m attending, you can have a look at my tumblog. I’m not promising to cover everything, but I’ll have a go
-
16:50 Push-Data: Alex Passant talks about sparqlPUSH
» Nodalities
This year, Talis sponsored the final Scripting for the Semantic Web challenge at the Extended Semantic Web Conference (ESWC). The winners of the challenge were Alexandre Passant and Pablo Mendes for their sparqlPUSH project. sparqlPUSH brings an element of real-time to working with Linked Data. Instead of needing to poll for new data periodically, sparqlPUSH works alongside PubSubHubbub to effectively push data out to where you need it. I spoke with Alex Passant at the conference about the challenge, real-time data, and sparqlPUSH. Alex also wrote about the scripting challenge on his blog.



















