Feeds
Tech Semantic Web and Linked Data
(600 unread)
-
OpenLink Community Blog (88 unread)
-
Z-Blog
(251 unread) -
Nodalities
(47 unread) -
EN - Flux RSS - R & D (11 unread)
-
The Semantic Puzzle
(69 unread) -
OpenCalais - Official Blog
(6 unread) -
Chief Marketing Technologist (124 unread)
-
Semai
(1 unread) -
Reactive, autonomous (3 unread)
Tech Coding the Web and Software
(157 unread)
-
Software Cooperative News (69 unread)
-
Talance Friendly Web Tools Blog (88 unread)
Tech General News
(14551 unread)
-
Tech Eye - Latest technology headlines (4226 unread)
-
BBC News - Technology (4632 unread)
-
NYT > Technology (5693 unread)
Knowledge Man and Eng
(310 unread)
-
ISKO UK (56 unread)
-
KOnnect
(10 unread) -
CELSTEC Publications
(216 unread) -
Knowledge Engineering (19 unread)
-
Open Intelligence
(9 unread)
Friends
(312 unread)
-
VISION AFORETHOUGHT
(82 unread) -
Snell-Pym
(230 unread)
Newspapers
(27568 unread)
-
The Guardian World News (10996 unread)
-
The Independent - Frontpage RSS Feed
(16572 unread)
Politics UK and Ireland
(1188 unread)
-
Liberal Democrats RSS (482 unread)
-
Green Liberal Democrats News Stories
(100 unread) -
Liberal Democrat Christian Forum
(5 unread) -
Liberal Youth - Latest News
-
The Alliance Party of Northern Ireland News Stories
(528 unread) -
Home
(73 unread)
Politics EU and International
(827 unread)
-
European Movement UK (38 unread)
-
European Movement Ireland
(91 unread) -
OSCE press releases and media advisories (376 unread)
-
ALDE News
(39 unread) -
ELDR News
(283 unread) -
IFLRY News and Updates
Religion Christian
(2255 unread)
-
Church of England News (265 unread)
-
Latest News
(730 unread) -
Open Path
(19 unread) -
Affirming Liberalism
(11 unread) -
Greenbelt Blog (464 unread)
-
Fresh Expressions RSS feed (452 unread)
-
Emergent Village
(56 unread) -
Taizé (258 unread)
Religion Interfaith and Universalism
(508 unread)
-
Interfaith (230 unread)
-
IDC Interfaith Dialog Center
(141 unread) -
Inter-Religious Dialogue
(137 unread)
The Semantic Puzzle
-
8:51 Has Google hi-jacked the Semantic Web?
» The Semantic PuzzleJust recently Google has launched the ‘Knowledge Graph‘ (GKG) which “understands real-world entities and their relationships to one another: things, not strings.” Has Google hi-jacked the idea of the ‘Semantic Web’ or at least its vocabulary?
Sean Golliher has compared the most central concepts of the SemWeb community to the wording of Google in his blog post, for instance: Google doesn´t talk about ‘Linked data’ or ‘URIs’ but rather about ‘things and their relationships’. We don´t know if Google uses standards like RDF but obviously a lot of concepts and ideas developed by the SemWeb community in recent years were implemented in GKG. Some people complain that Google should clearly state that this is an implementation of the ‘Semantic Web’ (which was not invented by Google), others say that most concepts like ‘taxonomies’ have been around for hundreds of years anyway.
I believe that both sides have now a great chance to work together: Whether Google’s goal, to “build the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do”, can be reached or not is a matter of the intelligence of the employees. A lot of potential can be found within the semantic web community: If Google gives credit where it is due, semantic web people will be a bit more inspired to support an eco-system built around GKG – and it won´t last long until an ‘Open Knowledge Graph’ will fit together with Google´s revenue model.
-
16:13 PoolParty PowerTagging – bringing semantics to enterprises
» The Semantic PuzzlePoolParty PowerTagging (PPP) is on its way: By extending Confluence´s label management, new application scenarios which make use of content recommendation and semantic indexing will be supported soon. PPP will be published at this year´s Atlassian Summit and at SemTechBiz in San Francisco at the beginning of June.
Tagging is still not a very popular task, especially in corporate environments. Many users don´t see the benefit of creating metadata to describe the actual content. A typical counter-argument to social tagging is that there are too many words for the same thing. “Even if I am tagging very hard my colleagues won´t find necessarily my pages because they will use different words to search for the content. I don´t have enough time to insert ‘New York City’, ‘NYC’, ‘Big Apple’ etc. as labels”.
The result: Tagging facilities of enterprise software platforms like Confluence are rarely used and don´t help to index content at all. Search is mostly based on classical full-text indexing. Semantic search as seen more and more on the WWW has still not entered the enterprise realm.
The Solution: thesaurus based indexing
W3C´s Semantic Web technology stack provides means to define controlled vocabularies like thesauri which results into more and more tools and data which make use of standards like SKOS. Tagging based on thesauri means that concepts are attached to pages & documents rather than putting labels on them. Labels like ‘New York City’, ‘NYC’ and ‘Big Apple’ refer to the same concept, thus it should be sufficient if one of the various terms is used for labeling, all the other names of this certain concept should be attached automatically.
PoolParty PowerTagging is able to analyse each Confluence page and to insert concepts from a thesaurus and all of their names automatically. Users can curate all suggested tags or they can also index their spaces automically resulting in a semantic index which makes search more comfortable than ever before.
Usage: enhanced collaboration with enterprise knowledge models
There are two main application scenarios which can be realised on top of Confluence and its PowerTagging extension:
- Semantic Search: Fully integrated with Confluence´s built-in Lucene based search facility, users no longer have to type in search phrases literally: Even if only ‘New York City’ is mentioned on a page on a word-by-word basis, it´s sufficient to search for ‘Big Apple’ or ‘NYC’ and results will be generated. This feature is especially interesting for domains in which a lot of technical terms or abbreviations are commonly used or for enterprises in multi-lingual environments.
- Content recommendation: Identifying similar and semantically matching contents especially in larger Confluence instances is a crucial task: Imagine you´re working for a recruiting company and you would like to match a new open position with all people in your applicant database. Or: Imagine you´re working on technical documentation and you can provide your customers automatically with further readings. Or: Imagine you´re working on a slidedeck and you´ll see instantly if some of your colleagues have worked on similar issues recently.
Don´t re-invent the wheel again and again. Save time and money. PPP will help to fulfill these tasks when creating rich contents more efficiently than ever before. You can link similar contents within Confluence automatically and you can fetch further readings even from the WWW like from Wikipedia.
If you are interested in trying out PowerTagging, please drop us a note and we will be happy to support you!
-
7:49 Exploiting Big Data: Linked Data and SKOS
» The Semantic PuzzleYesterday I gave a webinar covering the question which role SKOS plays in the linked data game. Just the day before I discovered an interesting white paper published by Fujitsu which clearly states that linked data and SKOS are excellent approaches to ‘create additional value in linking and exploiting big data for business benefit’.
I had at least five scenarios in mind in which SKOS and linked data in general can be combined. Take a look at the slides or watch the video to find out …
- how to publish SKOS thesauri as linked data
- how to generate SKOS from LOD sources like DBpedia
- how to make use of SKOS thesauri for entity extraction & content enrichment from LOD sources
- how to use linked data mechanisms for collaborative thesaurus management
- how to use SKOS for linked data alignment & better disambiguation
-
16:11 reegle.info – linked (open) energy data cloud
» The Semantic PuzzleAccess to the latest high quality information on renewables, energy efficiency and climate change is fundamental to the acceleration of the clean energy marketplace, facilitating investments, promoting new legislation and regulations and broadening interest and knowledge in the sector.
reegle.info acts as a unique clean energy information portal, targeting specific stakeholders including governments, project developers, businesses, financiers, NGOs, academia, international organizations and civil society. Alongside comprehensive country energy profiles, energy statistics and a directory of relevant stakeholders it also offers the clean energy search and an extensive glossary. There is also an insightful clean energy blog with interesting and up-to-date background information.
As reegle.info provides relevant clean energy data from several key energy open data sources as for instance OpenEI, World Bank Data or the UK Open Data Portal the reegle.info Information Gateway has a strong need for efficient and automated data management mechanisms and technologies! Therefore REEEP (The Renewable Energy and Energy Efficiency Partnership) decided to use Linked Open Data (LOD).
Linked Open Data provides a powerful way for reegle.info for sustainable data management and data integration – thereby the current reegle.info linked (open) energy data cloud came into being and looks as follows:
The figure of the reegle.info linked (open) energy data cloud above shows the model behind the scenes of the reegle.info clean energy information gateway providing an insight about sources and respective connections / links between the several sources and data sets.
For the realisation of the Linked Open Data based reegle.info system the following software components are in use:
- PoolParty Thesaurus / Vocabulary Management System
- SWCs Linked Data Manager
- OpenLink Softwares’ Virtuoso Triple Store
Going this direction the reegle.info clean energy key portal is very flexible for future expansions in the fields of data integhration and data management by new data sets from several data sources!
By the way – reegle.info is very open too – thereby the whole REEEP generated data is available via a Sparql endpoint for free re-use under the UK Open Government Data license on the reegle.info data portal!
Try it out and make use of free high quality clean energy data!
-
11:39 LOD2 Plenary Vienna (March 2012) – 3rd day – afternoon session
» The Semantic PuzzlePromising title. After two and a half day (well for almost all of us) we entered the final phase of the plenary. So two and a half days of intense and interesting discussions catching up with all that has been done so far and planning what should happen the next half year. But still two session in front of us.
The afternoon started with the discussion of WP9 the “Open Government Data” use case. First Uroš Milošević from Institut Mihajlo Pupin (IMP) reported about the Serbian CKAN project already holding some data from the Statistical Office of Serbia. Also tools from the LOD2 stack have been and will be used for this project. Sounds great!

Then Irina Bolychevsky of OKFN continued the session announcing that a better integration between CKAN and LOD2Stack should be made to get more RDF in publicdata.eu. Good idea! We were collecting ideas for integration and talked about e.g. a wizard for generating RDF from .csv files (ULEI is working on something like that). Also a integration of google refine has been discussed. The consortium decided to make an extraction sprint transforming a (to be defined) number of interesting data sets from CKAN to RDF.
Finally we had a discussion if linked data is a (the) solution for CKAN to find data and find related data etc. Well i think the people in the consortium are pretty sure it is (not so sure if people from OKFN are). Irina and Mark from OKFN invited everyone to provide input to the Use Case.

This session ended with a presentation about WP9a from Jindřich Mynarz from UEP and Martin Nečaský from CU. They are developing a distributed market place for public contracts. A ontology for public contracts has been developed and is open for review on google code. Next step here will be a web application for filing/creating public contracts in RDF as linked data using tools from the stack. So all in all pretty good progress in WP9.

The third day and the plenary ended with Martin Kaltenböck from SWC and Sören Auer our project lead from ULEI presenting WP10-11-12 Dissemination, Exploitation and Project Management. First we voted for our next plenary to be in Cambridge (hosted by OKFN). Past dissemination activities have already been presented on day one, so Martin reminded us all to write blog posts about all the great things we are doing in LOD2. Next big dissemination activity and also a good opportunity to meet people from the consortium will be the European Data Forum from June 6-7 in Copenhagen.
And that was pretty much it. I as i hope all the others enjoyed three days with a bunch of great people from all over Europe working on a great project. As always it was intense but it was also fun. Hope everyone had a save trip home!
-
21:38 Webinar: Linked Data and SKOS. Connecting the dots.
» The Semantic PuzzleRegister now!
For many organizations SKOS turned out to be the entry point to the Semantic Web. See how SKOS and Linked Data fit together:“Linked Data and SKOS. Connecting the dots.”
(Wed, Apr 18, 2012 5:00 PM – 6:00 PM CEST)Learn more about:
• How the creation of thesauri can become more efficient when built upon existing linked data sources
• How SKOS thesauri can be aligned with LOD sources and can be published as an LOD source itself
• How linked data mechanisms can be used to improve decentralized vocabulary management
• How SKOS and linked data alignment can be used for efficient schema mapping and value mapping
• How SKOS thesauri can be enriched with linked data to realise semantic search engines in a very efficient way
• How collaborative platforms like Sharepoint or Confluence can benefit from SKOS based knowledge models in combination with linked dataThis webinar is powered by LOD2.
-
9:00 Is Irish an official language in Britain? – Five arguments in favour of thesauri
» The Semantic Puzzle
The article ‘Five arguments in favour of thesauri & controlled vocabularies‘ shall not only clarify if Irish is an official language in Britain but also how thesauri and linked data mechanisms can support the following tasks:- Make knowledge explicit and available throughout your department, your organization, your partner network, the whole world!
- Conquer the Babylonian language confusion!
- Help to make search and research activities more efficient!
- Link all your information sources in a meaningful and standardized way!
- Make skills, interests and knowledge of experts visible!
The paper (PDF) should help to make the world of thesauri and semantic knowledge models accessible intuitively.
-
10:34 José Manuel Alonso: “If you want to scale up, you should consider LOD”
» The Semantic Puzzle
José Manuel Alonso has been working for W3C and CTIC in many open data projects. At the Web Foundation he promotes and supports (linked) open data in developing countries. Martin Kaltenböck from SWC talked with José about ongoing activities in the area of Open Government Data.Open Data is a powerful worldwide movement these days. Regarding open data projects in developing countries and in high industrialised countries (Europe, US, Australia et al) where do you see the main differences – regarding organisational – cultural – technical issues?
We conducted feasibility studies in Ghana and Chile several months ago, are supporting the Ghana government on the development of its national initiative and have visited and have engaged in Open Data discussions with many other countries in Africa, Latin America and Asia.
The situations are quite diverse and can vary significantly from country to country. It is always difficult to generalize, but I think there are a few important differences that can be highlighted (in no particular order):- The amount of information available in digital form is generally much lower
- The IT infrastructure is yet to be fully developed or under development
- The capacities on the government and civil society side have to be improved
- The mobile phone is the main device to access information but data connectivity is still scarce, only available in the big cities and not at all in the rural areas
- Digital literacy related issues have to be seriously considered and addressed
- Multilingualism is an important factor, as there are dozens of dialects being spoken in many countries
Said all of the above, I would say that there are also quite a number of commonalities such as privacy and security concerns, the resistance to change but also the existence of champions within government, and the interest and willingness in civil society, that is already producing a number of interesting applications.
You are also very familiar with the concept of Linked Open Data (LOD) – where do you see the main benefit in using LOD – where do you think are the main challenges – where the main obstacles?
Having managed a few projects achieving 5-star open data, I’ve learned a thing or two about the pros and cons. I’ve been saying consistently that there are a few important issues:
- There is still little knowledge about LOD out there and it is perceived as too complex
- The demand for LOD is, hence, very low
- The tooling is not powerful enough yet, specially when compared to XML tooling and others
- The modeling part is very tough
People are used to work with XML and Web Services and believe that anything along this line such as REST+JSON fulfils most expectations and needs. But this is not fully true. In my opinion, the power of LOD resides on the linking part more than anything else. Combination of data from disparate sources using RESTful techniques is much more difficult while it’s a natural fit for LOD.
My experience tells me that for dealing with few and simple datasets, investing in LOD is not really needed, but if you want to scale up and, specially, if you want to link and integrate, then you should consider LOD. It is generally a bigger investment but it pays back for interlinking big volumes of information, facilitates re-use in multiple formats, and can get very powerful when using SPARQL appropriately as it allows access to the whole underlying knowledge base.Where do you see the main differences regarding effort of publishing and benefit in re-use (or the re-use itself) between Open Data and Linked Open Data?
I would say that the main difference here is between using the Web as an archive for files and using the full potential of the Web. If one publishes hundreds of spreadsheets on the Web using an open format and license, he is already doing Open Data, but more than using the Web, he is going back to the FTP days. And that is not too different from giving away a USB stick with the files. We can do much better nowadays.
The often cited Tim Berners-Lee’s 5-star scale is a good reference here. The higher you can achieve on that scale, the more power of the Web you are using, the more you are facilitating reuse.
Are there differences regarding the use of LOD principles and technologies between developing countries and industrialised countries in your opinion? For example: does it make sense to start an Open Data Initiative in a developing country using Linked Open Data from the scratch?
All the issues with LOD I mentioned above apply and are even more strongly found in the developing world. I think we should take a step by step approach and start going from no data to some-star data in the very near term, lower the barriers one by one and start to building capacities in government and civil society but always with Web architecture principles in mind.
We will have to address the specificities of the developing world. For example, given that the LOD community is relying more and more on cloud-based options, on centralized data stores that require stable high-speed internet, how would one deploy a LOD solution in a country where clients (computers/mobile phones) have limited resources (disk, cpu) and where connectivity is unstable and with low-bandwidth? We’re participating in a worskhop to explore these issues.This does not mean that LOD is completely ruled out from the beginning. As I pointed out before, there are cases on which it can be extremely useful and powerful and in those, we intend to accelerate adoption, likely piloting and building capacities as a first step.
Could you please tell us a few words about the Web Foundation?
The Web Foundation was launched by the inventor of the Web, Sir Tim Berners-Lee, in 2009 to address global challenges by connecting humanity and empowering individuals through an increasingly inclusive and powerful Web. More on the vision of the Web Foundation at:
[www.webfoundation.org]Jose, many thanks for this interview. It seems that there is a quick progress in open data in developing countries as well as there are different requirements there to be taken into account in comparison to open data projects in Australia, the US or in Europe! Also the potential of Linked Open Data seems an interesting point for these countries!
We are looking forward to staying in touch with you on this in the future and wish you all the best for your future work in this area!
-
11:22 Automatic text analytics using DBpedia and PoolParty – A Live Demo
» The Semantic PuzzleLet me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!
Step 1. Generate a thesaurus by using a linked data source like DBpedia
As recently reported SWC has developed a tool called SKOSsy which can be used to extract seed thesauri from DBpedia. In our example I will generate a knowledge model describing the domain of “digital photography“. This step took around 15 minutes.
Step 2. Load the thesaurus into PoolParty and improve it to your needs
After the seed thesaurus has been loaded into PoolParty Thesaurus Manager you have many possibilities to enhance the knowledge model further: Add more categories, synonyms, relations etc. In this example I use the seed-thesaurus without any further improvements. This step took approximately 2 minutes.
Step 3. Generate an automatic text extractor on top of your thesaurus
This step took a couple of seconds and ended up in having generated a fast and reliable text mining application on top of PoolParty Extractor, ready to be used to enrich your documents with data from the LOD cloud.
You can try it out here: PPX Live-Demo
To try the extractor on your own, please take a look at the image above which shows a proper configuration, you have to insert the following UUID in the form: d35d4ddb-adc3-4ea5-b027-deacac03e391
Since our example is all about ‘digital photography’, we recommend to use text samples (or some fragments) like these ones to test the quality of PPX based text analytics:
- Digital Camera Image Noise (Results as HTML, RDF/XML)
- Nikon D3S In-depth Review (Results as HTML, RDF/XML)
- Introduction to Shutter Speed in Digital Photography (Results as HTML, RDF/XML)
- Digital Camera Sensors (Results as HTML, RDF/XML)
Let us know what you think about this straight-forward approach and your opinion about the quality of the results. We believe that thesaurus based text mining is in many cases an alternative to some other approaches, especially if you want to to enrich your content with information from the upcoming web of data.
Of course we would be happy to generate other demos in the areas of your interest! Just get in contact with us by using our contact form.
-
8:18 Linked Open Data: The Essentials – A quick start guide for decision makers
» The Semantic PuzzleTogether with REEEP (Renewable Energy and Energy Efficiency Partnership) the Semantic Web Company (SWC) has composed a fundamental publication on the topic of Linked Open Data.

Linked Open Data: The Essentials provides answers to the following key questions:
- What do the terms Open Data, Open Government Data and Linked Open Data actually mean, and what are the differences between them?
- What do I need to take into account in developing a LOD strategy?
- What does my organisation need to do technically in order to open up and publish its datasets?
- How can I make sure the data is accessible and digestible for others?
- How can I add value to my own data sets by consuming LOD from others?
- What can be learned existing best practices?
- What are the key potentials of sharing and consuming open datasets?
Read more about this publication and find out how to obtain a copy.
-
14:15 SKOSsy-Lottery: Free Pass to Semantic Tech & Business Conference, Berlin
» The Semantic PuzzleAs PoolParty Team is present at SemTechBiz Berlin 2012 (February 6-7), we want you to join us. This is why we have issued a little lottery to give away a full conference pass (€795) plus our unique PoolParty Cocktail Shaker in a set
How to enter the SKOSsy-lottery:
- Enter a comment in this post. One comment per person. Describing which type of thesaurus you are interested in.
- All comments must be submitted before Jan 25, 2012.
- The winners will be selected at random.
Together with our PoolParty Suite, we are ready to present SKOSsy on our booth at SemTechBiz Berlin 2012 Exhibition area. SKOSsy is a handsome tool, which generates SKOS based seed-thesauri in German or in English by extracting data from DBpedia. See our finger exercise on a thesaurus describing the world of Alan Turing – done with SKOSsy.
Let us know, which knowledge realm you are interested in and join the lottery now. Good luck, and see you in Berlin.
-
15:29 The ESA vocabulary site – Making Publishing and Reusing Vocabularies Easier
» The Semantic PuzzleReviewing the interview we made with Les Kneebone (project manager of the vocabulary projects at Education Services Australia) in November 2010 we can see that ESA has been one of the early adopters of SKOS as a standard for thesaurus development. Les said then: “We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as our new thesaurus management tool”. Around a year later ESA´s vocabulary site went online with PoolParty as its basis.
We asked Les to comment on his statement from last year and he confirmed that SKOS continues to be central to the ESA vocabulary business model and that it has also been important for ESA that PoolParty has been flexible enough to support continued publication of non-RDF formats, especially IMS VDEX.
In the course of this project it became more and more obvious that SKOS cannot only be used as yet another format for publishing thesauri but rather as a unified model to build thesauri in general. This approach made possible several improvements to the vocabulary development model and the maintenance process of ESA. Since all data is stored as RDF in a triple store, and SKOS and RDF are flexible formats supporting interoperability and interchangeability of data, many manual transformations that had to be done before are not needed anymore and all other systems using the vocabularies are dynamically fed by PoolParty offering the data in its needed formats (see image below).
Changes in ESA’s vocabulary development model
Les states that while some manual processes still exist to support legacy systems, PoolParty ensures the integrity and richness of ESA data. Support and customizations for legacy systems can be achieved in the confidence that the linked-data capabilities are centrally managed and stored in the PoolParty triple store.
From the publishing perspective, the previous vocabulary publishing site has been replaced by the PoolParty Linked Data Frontend (LD-Frontend) that has been customized especially for this project to offer more flexibility in the display and the layout of the data. Similar to the frontend for the Austrian Geological Survey mentioned in a previous blog post , the LD-Frontend has been adapted to the ESA styleguide and the display of the data in the HTML view of the frontend has been adapted to be more user-friendly (see screenshot below).
From ESA’s perspective Les commented here that for the vocabulary manager, edits to the frontend styles and templates are intuitive and can be tested in staging environments. But he also stated that for publishing support is important, and that SWC was very responsive.

Example ESA linked data frontend
Of course we asked Les to give a preview of the next steps for ESA. He stated that they include language translation projects so that its vocabularies, especially Schools Online Thesaurus (ScOT), can be accessed by wider markets and by students of other languages. He also stated that PoolParty handles multi-lingual thesauri very well.
We here at SWC are glad to see PoolParty used in more and more applications and usage scenarios. We are looking forward to the next steps that will be done in this project and also to see how the data offered by the ESA vocabulary site is used in other applications.
Thanks to Les Kneebone from ESA for his contribution to his blog post.
-
10:36 Going to SEMTECHBIZ Berlin 2012
» The Semantic PuzzleI went to London last September to visit SemTechBiz UK to represent the Semantic Web Company and PoolParty technologies in the exhibition area of this excellent conference. I had tons of interesting talks at our booth and – although I never found time to visit any talk – I have learned again a lot about customer´s needs.
Compared to ISWC or ESWC, two other major conferences in the area of semantic web, SemTechBiz is clearly the place to go if you´re interested in semantic web applications. Especially in the last three years we have observed a continuous growth of acceptance and demand for semantic web technologies in various industries. For many information professionals and IT managers it has become clearer than ever before that semantic web applications can solve several well-known problems in the areas of enterprise search, data integration, business intelligence and knowledge management.
Thus it was great news for us to have another SemTechBiz conference in place – this time in Berlin, which is one of the most vibrant cities in the world when it comes to innovative web technologies like linked data or open data. And again we will “explore how semantic solutions and linked data are being embraced throughout companies across a diverse range of disciplines and business categories”.
We hope to meet you at SemTechBiz Berlin 2012 (February 6-7) – PoolParty Team is present as Gold Sponsor and is looking forward to meeting you in the exhibition area to talk with you about your semantic web applications.
Related articles- SEMTECH 2011: The Semantic Technology Conference (logicamp.wordpress.com)
- Notes from ISWC 2011 (phenoscape.org)
- I-Semantics: Get in touch with Europe´s Linked Data community! (semantic-web.at)
-
13:43 I-Semantics: Get in touch with Europe´s Linked Data community!
» The Semantic PuzzleIn September 2012 I-Semantics will take place the 8th time. With more than 400 participants every year the conference is one of the largest conferences in Europe in the field of semantic systems and the semantic web. It is held concurrently with the I-KNOW Conference on Knowledge Management and Knowledge Technologies.
I-Semantics is a conference aiming to bring together science and industry:
- To address the needs and interests of industry the iPraxis track presents enterprise solutions that deal with semantic processing of data and/or information in areas like like Linked Data, Data Publishing, Semantic Search, Recommendation Services, Sentiment Detection, Search Engine Add-Ons, Thesaurus and/or Ontology Management, Text Mining, Data Mining and any related fields.
- In the exhibition area I-SEMANTICS 2012 will offer its participants a unique platform either to present latest and leading edge developments or to catch up with the developments of most innovative IT technologies, content applications, knowledge management trends and emerging market opportunities.
- For the first time in 2012 we will bring to you the I-CHALLENGE, consisting of the Best Paper Award, the Best Poster Award, the Best PhD Paper and the Linked Data Cup.
- I-SEMANTICS 2012 proceedings will be published in the digital library of the ACM ICP Series and will contain all accepted papers from the Research & Application track and the I-CHALLENGE. The topics of interest for research and application papers include (but are not limited to): The Web of Data, Quality of Semantic Data on the Web, Corporate Semantic Web, Semantic Content Engineering, Semantic Multimedia and (Linked) Data Ecosystems & Markets
- Looking back at I-SEMANTICS 2011 (semantic-web.at)
- I-Semantics: The Review in a Car – 2011 Edition (semantic-web.at)
- I-SEMANTICS 2011: Best Paper Award & Triplification Challenge Winners (semantic-web.at)
- I-Know & I-Semantics 2011 (socialsemantics.wordpress.com)
-
15:55 Experiences from teaching Linked Data
» The Semantic PuzzleDr. Bernhard Haslhofer works as instructor on Web Information Systems at Cornell Information Science. Just recently he gave a course which examined technologies for building data-centric information systems on the World Wide Web. Semantic Web Company (SWC) had the opportunity to talk with Dr. Haslhofer to examine the question “How to teach Linked Data?“.
SWC: Bernhard, you have been working on the Semantic Web and Linked Data for years now. What is the first lesson you usually give when you try to explain the “Semantic Web”?
Maybe I should first clarify that the course I am co-teaching is not a Semantic Web course. The course is about data-centric Web information systems in general and we spent some classes talking about Linked Data and Semantic Web technologies. We start explaining the origins and the fundamental architectural principles of the World Wide Web and then focus on the data-centric aspects of the Web.
“instead of building isolated repository-centric APIs we could also build a globally connected data graph”
After introducing various data exchange formats (XML, JSON & co.) we teach how Web APIs work, and discuss the design principles of RESTful Web Services. Then the conceptual transition to Linked Data is just a small step, because we can argue that instead of building isolated repository-centric APIs we could also build a globally connected data graph, which is based on a uniform data model and can be traversed and queried using SPARQL.
“DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work”
So, I am somehow approaching the “Semantic Web” bottom-up and concentrate on the “visible” parts of the “Semantic Web” vision. DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work. And last but not least, schema.org and the design of the Facebook Open Graph protocol also show the growing importance of having structured data on the Web.
SWC: At least for non-technicians “Linked Data” sounds very technical. Antoine de Saint-Exupery said: “If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” Is there an “endless immensity of the sea” you try to bring in as well?
If you can access and combine data from the Web you can answer interesting questions and discover previously unknown relationships between things. We thought the best way to learn about Linked Data is to implement simple demo applications. So we asked the students to think about uses cases that bring some benefit for end users and require data from several Web sources to answer certain questions.
“I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data”
One group developed a service which connects safety records with public transport information. Now users can now easily choose the “safest” bus connection between from and to New York City and other cities. Another group combined public school district information with geographic data, which now allows parents to view statistical information about school districts in New York State by using apps like Google Earth. There are many more examples, but most importantly, I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data.
SWC: Instructing how to use the Semantic Web is not only a matter of slide-decks. It is rather a question of concrete use cases in combination with tool skills. What kind of tool skills should students of information sciences acquire to your opinion?
Collecting and making sense out of data is a common scholarly practice in many research areas and the Web is becoming, or is already, the primary medium for publishing and distributing results. I believe that making data accessible as part of a some research activity will become increasingly important in future and the Web will probably be infrastructure for doing this.
So I think that a student who is working with data should at least know (i) how to retrieve and (ii) how to publish data on the Web in way that others can easily discover, access, and use their data. Linked Data is one possible technical approach for doing that.
SWC: As a European who is teaching and working in the U.S., how do you perceive the different approaches between those two systems when it comes to transfer complex fields of knowledge like the semantic web from universities to business environments?
From the experiences I have made in my previous and current working environments I can only tell that the relations between businesses and universities seem to be tighter in the US. I don’t necessarily mean “formal” bounds between institutions but rather informal relations between people, who understand complex fields of knowledge, both in the academia and in business.
“I assume transferring knowledge between two proxies who speak the same ‘language’ makes it a lot easier”
PhD students, for instance, often work in business over the summer and/or continue their career in the research department of some company. Some continue their cooperation with their former professors and academic colleagues and I assume transferring knowledge between two proxies who speak the same “language” makes it a lot easier.
SWC: What are the most important things which are still missing to make linked data technologies an integral part of enterprise information systems?
Quite often I hear the complaint that major database vendors still don’t provide satisfactory RDF support in their products. I don’t think this is a necessary precondition for implementing Linked Data but for some institutions this seems to be very important.
Many thanks!
Related articles- Linked Learning 2012, collocated with WWW2012 (linkededucation.wordpress.com)
- W3C Library Linked Data Reports (ivan-herman.name)
-
15:26 WordPress plugin to make use of linked data
» The Semantic PuzzlePoolParty Team has recently published an improved version of their WordPress plugin which enables linked data enrichments of blogs. Therefore a SKOS based vocabulary has to be uploaded or retrieved from a SPARQL-endpoint. Users and developers benefit from
- automatic annotation of all blog entries displayed as tooltips
- a comfortable search facility with auto-complete over all concepts from the linked thesaurus including semantic search over the whole blog
- an integrated thesaurus browser, plus
- a corresponding linked data frontend including RDF/XML serialization of the underlying thesaurus + SPARQL endpoint
All details about the new version 2.2.3 can be read here.
Related articles- Using DBpedia to generate SKOS thesauri (ablvienna.wordpress.com)
- PoolParty 3.0 and its all new Linked Data framework (semantic-web.at)
-
15:52 Introducing SKOSsy – generate thesauri on the fly!
» The Semantic PuzzleImagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality!
Sounds impossible? Reminds you of all the promises made by text mining software which generates “semantic nets” from scratch?Let me introduce you to SKOSsy. I will explain what this web service can do for you:
SKOSsy generates SKOS based thesauri in German or in English for a domain you are interested in. Not any domain but nearly any: SKOSsy extracts data from DBpedia, so it can cover anything which is in DBpedia. Thus, SKOSsy works well whenever a first seed thesaurus should be generated for a certain organisation or project. If you load the automatically generated thesaurus into an editor like PoolParty Thesaurus Manager (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. But you don´t have to start in the open countryside with your thesaurus project.
Let me give you an example: Imagine you are working for a company which is an international plant builder and you would like to index several thousands of documents the “semantic way”. You have to walk through the following steps:
- Identify proper categories in Wikipedia/DBpedia which describe best what your business or your domain is all about. Those categories should contain pages / resources which are related to the documents you would like to index. For example: [dbpedia.org] or http://dbpedia.org/resource/Category:Industrial_automation
- After you have selected proper categories SKOSsy will traverse DBpedia for you and collect all resources, their hierarchical and non-hierarchical relations, alternative labels, definitions and other properties and put them together as a valid SKOS thesaurus; this step will last a couple of minutes. (Find the resulting vocabulary here)
- Load the resulting thesaurus into PPT, explore it, improve it and enrich it with additional facts.
- After you´re done you can generate a tailor-made text extractor by using PoolParty Extractor (PPX) which is the second component of PoolParty product family
- With PPX and its extraction model especially curated for your special use case you can extract named entities from your documents automatically and index your documents in a meaningful way.
- After a few seconds your semantic search engine is ready to be used. PoolParty Semantic Search (PPS) which is the third PoolParty component will offer some nice facilities like categorized auto-complete, faceted search, content recommendation (similarity search) and smart search suggestions to ease your life as a knowledge worker.
We have constantly discussed the application of thesauri and other knowledge models to improve search over the last years. Many people understood straight away why thesaurus based search is most often much better than search algorithms purely based on statistics. Of course the big contra always was, “the costs are too high to establish a “good-enough” thesaurus or even a “high-quality” one”.
With SKOSsy in place those kinds of arguments become weaker and weaker. To sum up,
- SKOSsy makes heavy use of Linked Data sources, especially DBpedia
- SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes
- Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as “good-enough” knowledge models for any semantic search application you like
- SKOSsy based semantic search usually outperform search algorithms based on statistics since they contain high-quality information about relations, labels and disambiguation
- SKOSsy works perfectly together with PoolParty product family
If you are interested in the results produced by SKOSsy, just send us a short note about your domain or your project and we will send you an invitation as beta-tester or prepare a demo for you.
Related articles- Geological Survey Austria launches thesaurus project (semantic-web.at)
- PoolParty 3.0 and its all new Linked Data framework (semantic-web.at)
- PoolParty DemoZone Content Extractor Semantic Search Thesaurus Manager (poolparty.punkt.at)
- Query DBpedia for multiple keywords (stackoverflow.com)
-
15:54 Geological Survey Austria launches thesaurus project
» The Semantic PuzzleThroughout the last year the Semantic Web Company team has supported the Geological Survey of Austria (GBA) in setting up their thesaurus project. It started with a workshop in summer 2010 where we discussed use cases for using semantic web technologies as means to fulfill the INSPIRE directive. Now in fall 2011 GBA published their first thesauri as Linked Data using PoolParty’s new Linked Data front-end.
The Thesaurus Project of the GBA aims to create controlled vocabularies for the semantic harmonization of map-based geodata. The content-related realization of this project is governed by the Thesaurus Editorial Team, which consists of domain experts from the Geological Survey of Austria. With the development of semantically and technically interoperable geo-data the Geological Survey of Austria implements its legal obligation defined by the EU-Directive 2007/2/EC INSPIRE and the national “Geodateninfrastrukturgesetz” (GeoDIG), respectively.

Marcus Ebner, from the GBA Thesaurus Editorial Team
The construction of the thesauri has been done using the PoolParty Thesaurus Manager so they all are based on SKOS and fully compliant to the Linked Data principles. Apart from the standard implementation of SKOS some additions were made to the data model using Dublin Core terms for extra metadata and custom sub properties of skos:related to give some semantic constraints to related properties. This basically means that a big effort was put into the integration of bibliographic references for every concept in the data set using dcterms:source. This aims at the requirements of reuse by the scientific community and incorporation in domain specific data sets. On the other hand rdfs:subProperityOf was used to express how international geologic time scales map on regional concepts.
Currently four thesauri have been published, all are available in English and German and can be used under the cc-by-sa license. Also mappings to DBpedia have been made:
With the new PoolParty Release (3.0) the Linked Data front-end has been redesigned and is now highly customizable and extendable. In the GBA Thesaurus Project it is used as an publishing interface for the created controlled vocabularies both for the machine readable RDF version and an custom HTML version for comfortable browsing and searching.

GBA Linked Data frontend
After all it’s satisfying to see a project we’ve supported and worked on for some time now come to live and now we are looking forward to the next steps that will be done in this project.
P.S.: Thanks to Marcus Ebner from GBA for his contribution to his blog post.
-
17:54 TimBL @ Hofreitschule in Vienna
» The Semantic PuzzleRight now, 19:30, Tim Berners-Lee is giving a key note on the future of the internet at the Hofreitschule in Vienna, a marvellous, historic venue in the very city.

His talk is a plaidoyer for an open internet, that works independent from central control and political implications, on top of open standards AND net neutrality. This is especially relevant when it comes to open data, where the social machinery of the web will help to improve many flaws democracy is facing today.
So, what are the implications: Study Web Science! And trigger gentle, non-violent change!
It was a pleasure to listen!
-
11:01 rNews and its benefits for publishers
» The Semantic PuzzleLast Wednesday at the Open House event of the Semantic Web Company in Vienna, Evan Sandhaus, Lead Semantic Architect at NY Times gave a comprehensive and entertaining introduction to rNews and its potential benefits for publishers.
Evan Sandhaus (f.l.t.r) busy preparing his talk in the kitchen of SWC, together with Andreas Blumauer (SWC) and Leo Sauermann (Gnowsis).
Mr. Sandhaus in action.rNews is a RDFa vocabulary, which is basically a carefully selected subset of the very rich IPTC vocabulary and some additional elements that came up during the standardization process. It is now available in version 1.0 and – according to Evan – actively supported by schema.org.
As showed above the data model of rNews is really simple and centered around two classes: the NewsItem and the Concept. This deliberate simplicity is a major advancement compared to standards like NewsML (whose complexity probably prohibited its critical uptake among the news industry). But due to the functional extensions attributed to RDFa, rNews might also be considered more complex than hNews, the microformat equivalent issued by the IPTC in 2009.Evan mentioned three scenarios that might drive the uptake of rNews for the benefit of news publishers:
1) Better news search
rNews allows you to explicate and differentiate various documents elements like, title, author, text body, picture etc., thus giving the publisher better control of what to expose for indexers and web crawlers. This might not just improve the display of rich snippets in the search results of Google and other search engines, but also allow automated population of faceted search and metadata based similarity search.
2) Better ad placement
As rNews can be applied to any kind of news-relevant media irrespective of its format (grafics, audio, video, etc.) the metadata can be used to avoid “unfortunate juxtapositions” between editorial content and ads. Hence, media agencies could profit from this additional data by fuelling their matching algorithms and gain better insight into the context specificities of content items.
3) Better analytics
By improving the semantic granularity of a news item this additional information can be used to carry the web analytics beyond the page level and provide a better insight into usage patterns. The additional data can be applied for visualization and exploration purposes i.e. for search engine optimization, sentiment detection and many more.
This is just a small fraction of things rNews could be used for. All in all it is exciting to see that IPTC has finally started to provide publishers with a standard that is relatively easy to implement and help them to overcome the obstacles of existing technologies without disrupting existing publishing workflows. In multi-sided markets like the news industry this might be a crucial success factor!
-
10:42 IPTC (International Press Telecommunications Council) has ...
» The Semantic PuzzleIPTC (International Press Telecommunications Council) has developed rNews, a set of specifications and best practices for using RDFa to embed news-specific metadata into HTML documents. Just recently the initiative received new momentum when schema.org announced that it has added support for rNews (see press release from Sep. 27, 2011).
Alltogether this is another very important building block for an even broader adoption of the semantic web – and the Semantic Web Company is proud of being able to welcome some of the proponents of rNews in Vienna.
Tomorrow at our first “Open House” event (which is proudly presented by the Vienna Semantic Web Meetup) not only our guest list but also the speakers list is very promising:
- Evan Sandhaus, IPTC Delegate and Lead Architect For Semantic Platforms at The New York Times Company
- Andreas Gebhard, Member of the Board of Directors of the IPTC and a Managing Editor at Getty Images
- Stuart Myles, Lead of the Semantic Web Working Group of the IPTC and Deputy Director of Schema Standards at The Associated Press
- Marco Neumann, Lotico
We will report about the talks on “the day after” but we are sure that this meetup will bring a lot of light into the ongoing discussion why media and press companies finally should start to adopt semantic web standards.
Related articles- Impressions on the Schema.org Workshop (w3.org)
- The Semantic Web Media Summit (theconferencecircuit.com)
- New W3C Community Groups in the Semantic Web World (w3.org)
-
17:57 I-Semantics: The Review in a Car – 2011 Edition
» The Semantic PuzzleContinuing the tradition of last year’s review in a car, the Semantic Web Company’s participants of the I-KNOW / I-SEMANTICS talked about their impressions of the conference while on their way back to Vienna.

Image based on work by Paolo Mañalac
Thomas Schandl: An especially nice thing about this conference is that it’s co-location attracts people from two separate communities: Knowledge Management and Semantic Web. This serves as a natural facilitator for looking beyond the boundaries of one’s own domain and getting more than a glimpse of what’s currently happening in related fields.
That being said one of the most interesting talks I attended was a talk from KM expert Prof. Martin Eppler and his take on “Sketching at Work“, which introduced loads of sketching methods which can help to solve problems, inspire creativity and support communication.
From the Semantic Web side I enjoyed the innovative approach taken by Hasso Plattner Insitute‘s DBpedia powered quiz game Risq!. It is a Jeopardy-like Facebook game, that (besides being fun) sheds insights as to which facts are especially important to characterize a Linked Data resource. E.g. when the system wants you to guess a specific “female politician” would it help you more to know that she is part of the category yago:LivingPeople or would you rather get the hint that she is dbpedia:Chancellor_of_Germany?
By analyzing the logs of the played games, the researchers can find out which triples have more discriminative power than others.Through the many personal encounters I also got a lot of input on which new features would be especially interesting for future versions of PoolParty and what we should concentrate on in the LOD interlinking project LASSO that Bernhard Schandl (Gnowsis / Refinder), Stefan Wunder (Neurovation) and me presented at the I-Praxis track.
Andreas Blumauer: Again, this year was absolutely worth coming to Graz also from a business perspective. For me it was the 10th time going to Graz. When I went to the second edition of I-KNOW in 2001 I remember that nearly nobody has ever heard of “semantics”. When I-SEMANTICS came to Graz the first time, this was in 2007, it was still unclear for most visitors how semantic technologies could contribute to a more efficient enterprise knowledge management. Nowadays, 10 years later, there is another question most prominent:
Which kind of semantic technology is solving my problem?
Being most of the time at our exhibition booth I enjoyed talking to visitors who had very concrete plans & ideas about how to use linked data, text mining or knowledge models for their business. The time when we had to explain what the “semantic web” is all about is over.
Christian Dirschl´s (Wolters Kluwer) keynote on Friday was exactly reflecting this fact: It´s good to see how big players have started to integrate the idea of linked data into their processes already. The days when we had to explain the difference between RDF and XML seem to be over. Or at least almost.
Florian Kondert: It was a vibrant atmosphere for me, since I didn’t make it to participate to just one track, but talking to interesting and interested persons at the booth without one break, instead.
From the participant’s perspective the conference as a networking platform was a huge success – and it definitely didn’t stop at dusk! It is worth pointing out the diverse needs and ideas on semantic use cases, that allow us to learn more with every discussion. The bottom line is that semantic solutions are badly needed for many organisations – and they start to realize, that there are no working alternatives at the moment.
On the other hand it is crucial to show up with real life examples, not just with prototypes that might work tentatively! As providers for semantic solutions we face decision makers on the highest level and they demand high level remedies – so, no time to take a break!
Tassilo Pellegrini: As the conference chair I really had an intense, but all in all very positive time at the conference. Interesting people, inspiring talks and a really good time at the socializing events (greetings to Leo Sauermann & Co. – I enjoyed the drinks!). For a general conference overview read my post from a few days ago.
But there is more to such a diverse conference as just talking about semantics. As some of you might know, beside my interest in Semantic Web, I have been involved in some policy consulting lately concerning the topic of net neutrality. At the conference I took the opportunity to talk to some telecommunications-savvy people and had some really great conversations (Harald … I really enjoyed our discussion!). But to my surprise I had to find out that – especially among the engineering guys – there seems to be very little awareness about the pressing social, cultural and economic consequences that an abandoning of net neutrality will have on the Internet as we know it today. For those readers who are into semantic web but not into the net neutrality discourse I want to reduce it to a very simple formula: without net neutrality you can say goodbye to linked open data. And this should really make us think and act!!
Related articles- Semantic Web Company and punkt. netServices have merged (semantic-web.at)
- Looking back at I-SEMANTICS 2011 (semantic-web.at)
-
12:26 I-SEMANTICS 2011: Best Paper Award & Triplification Challenge Winners
» The Semantic PuzzleThis year the I-SEMANTICS conference gave away prices for the best scientific paper and the most promising triplifications.
The best paper award went to Pablo N. Mendes, Max Jakob, Andrés García-Silva and Christian Bizer for their contribution “DBpedia Spotlight: Shedding Light on the Web of Documents“.
Abstract: The paper impressively shows how Linked Open Data can be utilized as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, the authors developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. They compare their approach with the state of the art in disambiguation, and evaluate their results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of the system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
For the 4th time I-SEMANTICS hosted the Triplification Challenge, an event aiming at stimulating the availability of large quantities of RDF data and showcasing practical applications built on top of them. The Challenge consisted of an unspecific “open data track” and a dedicated “open government data track” for which one winner was selected. The prize money of 1000 Euro each was sponsored by Wolters Kluwer Germany.
The “open data track” award went to Daniel Garijo, Boris Villazón and Oscar Corcho for their contribution “A Provenance-Aware Linked Data Application for Trip Management and Organization“.
Abstract: The authors present El Viajero, an application for exploiting, managing and organizing Linked Data in the domain of news and blogs about travelling. El Viajero makes use of several heterogeneous datasets to help users to plan future trips, and relies on the Open Provenance Model for modeling the provenance information of the resources.
The “open government data track” award went to John Erickson, Yongmei Shi, Li Ding, Eric Rozell, Jin Zheng and Jim Hendler for their contribution “TWC International Open Government Dataset Catalog“.
Abstract: The TWC International Open Government Dataset Catalog (IOGDC) integrates a diverse selection of more than 70 government dataset catalogs from around the world. IOGDC demonstrates a practical dataset catalog metadata model for integrating diverse dataset catalogs collected from the real world and linking those catalogs into Linked Data Cloud. IOGDC’s faceted browsing and search interface provides a scalable and reconfigurable solution for finding and browsing open government datasets which also offers a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlight the potential for useful Linked Data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide.
All papers are available in the ACM Digital Library.
We thank all participants for their contributions and wish the winners all the best for their future work!
-
9:27 Looking back at I-SEMANTICS 2011
» The Semantic PuzzleFor the 7th time, I-SEMANTICS, the International Conference on Semantic Systems, took place in Graz, presenting latest research outcomes and industry-ready applications to the wider public. Co-located with I-KNOW, the 11th International Conference on Knowledge Technologies, the event proved once again that the interest in semantic information processing is high and of increasing practical relevance.

More than 70 scientific and 40 industry presentations provided a valid overview over current technological and organisational trends in various areas of semantic computing like text mining, information retrieval, visual analytics, semantic content engineering, social semantic web and linked data. Especially the last topic appeared in many different contexts, showing that the linked data paradigm is gaining traction as a horizontal topic that crosses domains and communities.
One of the conference’s unique characteristics is the high amount of attendees from industrial domains, searching for inspiration and solutions for practical problems on the one side, but also for diversification potentials of their business on the other side. In this respect the applied scientific approach of I-SEMANTICS / I-KNOW has proven to be a valid approach to scrutinize academic research against its reusability in industrial settings, transfer knowledge and skills between both communities and provide incentives for cooperative research and project engagement.This cooperative spirit was also represented by the four key note speakers, who took a deliberate practical approach to show how high level research fertilizes organizational reflexivity and triggers change for sustainability on a societal, cultural and economic level. Hence, Daniel A. Heim, Professor at the Computer Science Department of the University of Konstanz, Germany, showed how visual analytics have a stake in solving organizational problems. Gloria Mark, Professor at the University of California, Irvine, USA, talked about the challenges that derive from informational multi-tasking. Stefan Rueger, Professor at the Knowledge Media Institute of The Open University, United Kingdom, gave a talk about “potential, automation and limits of knowledge discovery in the web” and Christian Dirschl, Head of Content Strategy Department at Wolters Kluwer Germany, gave an insight in how one of the global players in legal publishing is utilizing linked data and semantic web technologies to prepare for the next step in web-based business diversification.
The next I-SEMANTICS will take place from September 5-7, 2012 in Graz again. Hope to see you there and enjoy the impressions … -
14:47 PoolParty 3.0 and its all new Linked Data framework
» The Semantic PuzzleThe new major release of PoolParty boasts with new Linked Data capabilities that further unlock the potential that the Semantic Web can bring to improve your metadata management, to enhance your data with external knowledge and to ease data integration efforts within your organization and with your partners.
In PoolParty 3.0 we created a Linked Data interlinking editor, making it easier than ever to add your own lookup and interlinking services (even for non-RDF sources) and made the Linked Data publishing front-end fully customizable in design, layout and regards to which parts of your content will be displayed.
But let’s start at the beginning:
Step 1 – Hook into the Linked Data Cloud!In the era of the rapidly growing Linked Data Cloud your knowledge models don’t need to stay isolated from the outside world anymore. Simply use PoolParty’s new and improved lookup service to find matching resources from the Linked Open Data Cloud (e.g. from DBpedia).
Imagine having different data models that all refer to the same product categories and world regions. Once you have them represented in PoolParty you can use its lookup service to find matching resources from the Linked Data Cloud. In this way you will get globally used identifiers for your product categories and regions, usually in the form of a URI like [dbpedia.org] . This eases your internal data integration efforts, and it can aid the data exchange with partners or customers and enables hassle-free distributed management of knowledge models.
Image 1: Lookup of concept ‘Austria’ and selection of properties and values to be imported
With PoolParty 3.0 we increased the number of included lookup services: DBpedia, Geonames, Wordnet, Umbel, Yago, Freebase, Sindice, dmoz and LCSH – BBC Wildlife, Enis and Gemet are available on request.
Step 2 – Pull in Semantic Data!There is a vast amount of Linked Data out there just waiting to be leveraged for thesaurus creation and extension. To meet that end we had a close look at our interlinking module and decided to enhance it a way that it becomes more of a Linked Data editor.
Once you have a base thesaurus in PoolParty and hooked a couple of your concepts into the cloud as described above, you can proceed to pull in the good stuff that comes with the Linked Data resources you have found.
Image 2: Imported Linked Data for concept ‘London’
As you can see in the image above, you can extend your local thesaurus with labels, definitions and all kinds of other information like e.g. in the case of countries their population, GDP, spoken languages, famous people born there, newspaper articles related to the political situation, and so on.
Now PoolParty 3.0 takes this approach a couple of steps further. You can not only specify which of your local concepts corresponds to which Linked Data resource and grab all semantic information that comes with this resource, but now you are able to selectively pick out the data items you are interested in and even transform the predicates they originally came with. Just switch them to whatever custom properties you created or want to re-use from any ontology (see an example in Image 1).
In this way you can easily enrich your own knowledge models with external information – which in turn can be utilized for better content recommendation, easier data integration and improved search services.
Step 3 – Publish your Linked Data in StylePrevious PoolParty versions already offered the possibility to instantly publish your thesauri, taxonomies or vocabularies and display their concepts as HTML while additionally providing machine-readable RDF versions for them. This means that anyone using PoolParty intuitive GUI can become a W3C standards compliant Linked Data publisher without having to know anything about Semantic Web technicalities.
Of course you don’t need to publish all your valuable models, just choose the parts that safely can be shared with the public and keep everything else behind your firewall, available only to you and trusted partners!In this new release of PoolParty the design of all pages on the Linked Data front-end is now under your full control. You can use your own style sheets and create views on your data with velocity templates. It is even possible to develop project- and thesaurus-specific templates and layouts, so they can have an individual look and display different predicates and their values.
Take a look at PoolParty´s standard linked data frontend!
The following images show a PoolParty default Linked Data page and a custom-made Linked Data page of a PoolParty concept that has some DBpedia info imported.
Image 3: PoolParty default Linked Data page

Image 4: Custom Linked Data page of ScOT thesaurus (courtesy of Educational Services Australia)
Step 4 – Unlock new Linked Data SourcesWith PoolParty 3.0 you are in no way limited to DBpedia, Freebase, Geonames and the other lookup services that PoolParty provides out of the box: you can add your own non-Semantic Web data sources to the mix, thereby enabling you to boldly go where no Linked Data tool has gone before.
Maybe you have a product thesaurus and want to specify which products are related to patents that can be found with Google Patents?
Or you want to interlink concepts from a company taxonomy with related articles from the Guardian’s search service or any other newspaper that provides a search API?All those sources are not available as RDF, so how can you re-use them easily as data sources for Linked Data style interlinking? For such cases PoolParty introduces the Unified Lookup API, which makes it easy to turn almost any third party Web API into a source for interlinking your concepts with third party resources as described above.
This makes it possible to interlink your concepts with many kinds of data out there, be it New York Times articles, UN data, synonym services, abbreviations, press releases, juridical information – or any web API important for your knowledge domain.
That being said, if you have suggestions for additional lookup services that you think are interesting, let us know!
To gain a first hand impression of the new PoolParty just apply for a demo account!
-
14:25 Semantic Web Company and punkt. netServices have merged
» The Semantic PuzzleWe are pleased to announce that two companies which have had already a significant standing within the European Semantic Web scene, are now acting under one brand. The long lasting expertise in developing, programming and integrating linked data technologies of punkt. netServices and Semantic Web Company’s consulting expertise have merged under the resulting label Semantic Web Company.
In 2004 Semantic Web Company was founded as a spin off of punkt. netServices to bring the semantic web and linked data technologies closer to the needs of companies, consumers and the government sector. We have done a lot of basic research those past years, as well as project-pioneering with prospective customers and partners. Finally we have consolidated our knowledge and skills in that field. What was avantgarde in 2004 now has become bleeding edge technology in present days. A good moment to join efforts and bring together the two sisters.
With the new Semantic Web Company, you can count on a team of 20 experienced experts from the areas of knowledge management, enterprise software architecture, search engines, collaboration software, agile web development and – last but not least – the semantic web. We are a powerful partner when it comes to realise enterprise-ready solutions. An enlarged company needs more space, so find our new headquarter on lovely Mariahilfer Street in Vienna in a building designed by famous Austrian architect Adolf Loos.
Read more about our goals and visions online on our brandnew website or get in touch with our team on-site, joining one of our monthly Open House Meetings.
Links:
- Semantic Web Company (Wikipedia)
- PoolParty Website
-
8:19 “Thesaurus based search engines will become main stream in the near future”
» The Semantic PuzzleThe results of the survey titled “Do controlled vocabularies matter?” which was conducted by Semantic Web Company from May until June 2011 are public now. Over 150 participants from 27 countries draw a picture of the current and future usage behaviour in the realm of controlled vocabularies.
Here are three of the most interesting outcomes of this questionnaire – the whole report can be found and downloaded on issuu:
Do you think enterprises and other organizations can significantly benefit from using Linked Data?
The answer is a clear YES. A subsequent question also reveals that all kind of organisation sizes have about the same opinion concerning linked data. Only few people think that linked data is a “niche thing”. In general it can be said, that over 90% of the participants think that most or at least some organisations can benefit from using linked data.Do you think that search engines which utilize thesauri to improve results will become main-stream
The results of this question are amazing: Two thirds of the participants think that thesaurus based search is already or will become main-stream in the near future. Scepticism towards this development seems to be low – at least it can be stated, that a clear majority thinks that thesaurus based search engines will become main stream in the near future.How important is the usage of standards like SKOS for controlled vocabularies?
The results speak for themselves. The majority of the participants are convinced that standards like SKOS are important for their daily work. In August 2009 W3C announced the new SKOS standard – now, nearly two years after, it looks like this standard has well arrived. 48.7% stated that standards like SKOS are very important and 29.1% voted for “relevant”.As an overall result of the survey it can be stated: Semantic Web community has done a great job to convince the controlled vocabulary people to benefit from SKOS and linked data – on the other side only 3-5% are aware of SPARQL as a valuable resource to build standard APIs around controlled vocabularies to lower costs when implementing such knowledge organization systems.
Many thanks to all participants of this survey!
-
8:12 data.wien.gv.at – the process to Vienna’s open data portal
» The Semantic PuzzleOn 17 May 2011 the time has come – the first Open Government Data (OGD) portal of a public administration in Austria was launched – and it was the capital Vienna that did this courageous and so important step in Austria and thereby took the role of a pioneer in the area of open data in our country – and hopefully will act as a model for communities, cities, states and the federal government (also important to be mentioned here is that the Open Commons Region Linz has been the first city government that has announced a data portal in Austria still before Vienna – launch date will be September 2011).
[data.wien.gv.at] is a first well done step in the area of Open Government Data for a modern and open City of Vienna. Open human- and machine readable data in several formats and from several categories (e.g. population, education, budget, leisure time and many more) are availabe for re-use now. Into the bargain available under the CC-BY-3.0 License of Creative Commons.The road to 17th of May 2011 has started about 1 year ago – at least from the pointview of the Austrian (and Viennese) open data community: on the 8th of April 2010 a group of linked open data enthusiasts – representatives of universities, companies and the civil society – invited interested people to come to the 1st Open Government Data Meetup at the OCG (Austrian Computer Society) in Vienna. For talks there were Rufus Pollock of the Open Knowledge Foundation on site in Vienna as well as Stefano Bertolo of the European Commission has been hooked up via skype to shine a light on this – at this time – for Austria and Vienna very new topic of Open Government Data to present their experiences and best practices in the field to about 60 participants. The interest was very high – also on the side of the media – and therefore a basic interest as well as a first braod information in Vienna was built.
Afterwards everything went quickly until the 17th of May 2011 (and also if one year seems to be a long time I do think that it was an enormous performance of all involved parties to manage so much in only one year!) – after the mentioned MeetUp, the OGD Austria was founded – an initiative thats’ objective is to open (linked) government data (non personal) in Austria in human- and machine readable formats for re-use. To do this together with politics, administration, civil society and industry. Other initiatives as open3 as well as established institutions in the area of administration research as KDZ – Zentrum für Verwaltungsforschung or the Danube University of Krems or Joanneum Research – but also companies like the Semantic Web Company or Compass Verlag, and above all lots of representatives of the civil society who were interested in the topic of Open Government Data (it is important to say that in Vienna we do have a very active creative scene and web 2.0 community) did work together to push the field of open data in Vienna / Austria.
In June 2010 the Semantic Web Company (SWC) – with support from above mentioned institutions – submitted a proposal to the technology agency of the City of Vienna (ZIT) to build and implement a bundle of measures for awareness-building activities in the field of Open Government Data in Austria – the project: OGD2011 was born. The authorisation of this project (partly funded by ZIT) for sure helped a lot to inform the relevant stakeholders (politics, public administration, civil society, industry, academia and media) in the mentioned time period and to build awareness about the power, the potentials as well as about the challenges – and the important concrete steps – of Open Government Data!The following measures were implemented and will be implemented in the course of OGD2011:
- Open OGD Austria Stammtisch every second month (meetup, until today only in Vienna)
- 4 Stakeholder Workshops (politics, administration, civil society, industry) in February 2011 to evaluate and identify as well as to discuss the requirements on Open Government Data in Austria from the viewpoint of the respective stakeholder group
- Publishing of the OGD Digest Austria – Information around Open Data in Austria and international in print & PDF (until today 4 editions available)
- Set up and operation of a mailing list as well as a XING group
- Organisation of an open MeetUp on OGD on 15th of Juni 2011 in Vienna
- Set up and operation of open wiki spaces for collection of information and provision of relevant information in the field of Open Data
- OGD2011 Conference on 16th of Juni 2011 in Vienna
- And very important: about 40-50 bilateral talks with representatives of politicians and public administration in Vienna about OGD to raise awareness and clarify misconceptions
- Networking with international initiatives on the topic of open data as the Open Data Network (Germany), the Open Knowledge Foundation (UK) or the ePSIplattform (just to name a few) to ensure continuous exchange on the topic – as well contentwise as about the process for an Open Government Data strategy – to learn from each other and to support each other…
- Furthermore in July/August 2011 the Open Government Data White Book Austria will be published as a fundamental work on open data in Austria
Inspite the OGD2011 project is arranged for the whole country of Austria the participants at the workshops and events were mainly from Vienna – what is not really surprising as most of the Austrian public bodies are located in Vienna and the City and the State of Vienna has a special status in Austria.
In November 2010 another very important step happened becuase without an Open Government Data strategy it is nearly impossible to be implemented – the political YES to Open Data in Vienna in the programme of the government of the new red-green coalition.
Regarding the implementation of data.wien.gv.at the City of Vienna received support by the EU project LOD2 – LOD2 did consulting on the following topics: Open (Government) Data, Linked Open Data, licenses and business models, as well as in the area of data sheets, meta data and URL schemas in the course of the LOD2 Publink Consultancy Services.
I think that in total the following indicators were crucial for the success of the Open Government Data movement in Vienna so far:
- Broad awareness raising at all involved stakeholder groups
- Collaboration of all stakeholders and establishing of an open dialogue between these groups
- Political commitment on the highest level
- High interest as well as engagement on the side of the public administration at the City of Vienna
- High interest and support by the media – most of all by the Open Data Blog of futurezone
- Support of the OGD2011 project by ZIT to enable a basic funding for concrete activities and measures
- Building of a strong community for Open Data and therefore permanent presence of the topic in the public
- Evaluation and representation of potentials and opportunities – but also of existing risks – of Open Government Data in Vienna
- Exchange of knowledge and experiences with international initiatives to learn from each other and use best practices vice versa
- Intense analysis of: licenses, meta data, data description (data governance) and a very well done implementation of phase 1 of data.wien.gv.at by the City of Vienna (with support by LOD2 et al.)
But this phase one of data.wien.gv.at
can only be a start – the City of Vienna already announced continuous exchange between the public administration and the community for further development of the data portal (and today on 26th of May 2011 we had the first meeting with about 50 participants and really very fine discussions about 2 hours long). Further an online survey is planned for summer 2011 (to ask the public for concrete data needs) and an open data challenge is planned for the end of 2011 on the basis of Viennese Open Government Data – and there will also happen something in the area of the scope of the provided data sets (more data will be opened) as well as in the area of the provision of additional data formats and interfaces (along the lines of the EC and UK the City of Vienna wants to follow the path of Linked Open Government Data)….… I am absolutely curious about how the process of Open Government Data in Vienna will go on from here in 2011 and 2012!
Additional Links: [www.wien.gv.at]
Author Martin Kaltenböck is CFO of the Semantic Web Company Wien and co-founder and member of the executive board of the OGD Austria -
8:39 I-SEMANTICS 2011 – Final Reminder for Extended Submission Deadline
» The Semantic Puzzle
This is the final call for papers for I-SEMANTICS 2011. Due to several requests we decided to extend the deadline of the I-SEMANTICS Conference till Monday, May 30 , 2011.The new dates are:
Extended Submision Deadline: May 30, 2011
Notification of Acceptance: June 27, 2011
Camera Ready Paper: July 18, 2011I-SEMANTICS 2011 (www.i-semantics.at) is the 7th conference in the I-SEMANTICS series and takes place from September 7 – 9, 2011 in Graz / Austria. I-SEMANTICS brings together both researchers and practitioners in the areas of Linked Data, Social Software and the Semantic Web in order to present and develop innovative ideas that help realising the “Social Semantic Web” and the “Corporate Semantic Web”.
I-SEMANTICS 2011 will be the host of the 6th AIS SigPrag International Conference on Pragmatic Web as well as the 4th edition of the TRIPLIFICATION Challenge. Further on I-SEMANTICS will be complemented by I-KNOW (www.i-know.at), the 11th International Conference on Knowledge Management. This setup is aiming to reflect the increasing importance and convergence of knowledge management and semantic systems.
The scientific track invites long and short papers along the main topics “Linked Data and Web of Data” to “Semantic Web Applications and Application Building Blocks, Studies, Metrics & Benchmarks”. The papers will be published in the ACM ICPS series. The detailed CfP containing all scientific tracks can be found here: [i-semantics.tugraz.at]
To address the needs and interests of industry the i-Praxis track invites enterprises and public organisations to present industry relevant solutions in the field of semantic technologies. Presenters will be granted free access to the conference and will have generous time slots to present their applications. The presentations will be published on the conference website. Please find more information here: [i-semantics.tugraz.at]
-
15:30 Interview on Enhancing Semantic Web applications with Linguistic Information
» The Semantic PuzzleJohn McCrae (Uni Bielefeld), Elena Montiel-Ponsoda (Universidad Politécnica de Madrid) and Tobias Wunner (DERI Galway) will hold a tutorial at the ESWC 2011 with the title “Enriching the Semantic Web with Linguistic Information“. We had a chance to talk to them beforehand:
Can you please tell us about the aims and purpose of your tutorial and the importance of incorporating linguistic information in the Semantic Web?
With the continuing growth of linked data and semantic technologies the incorporation of linguistic descriptions into Semantic Web resources has become a challenging issue. The integration of linguistic information especially on a multilingual level could greatly benefit Natural Language Processing (NLP) applications. Furthermore, the continuing growth of ontologies for semantic modeling and the use of terminological resources to add human language descriptions has raised the issue of how to add linguistic information to ontologies and linked data vocabularies and to represent models of lexical and terminological information in a way which is compatible with Semantic Web standards. Prominent examples here are, for instance, multilingual language tags in RDF Schema or SKOS’s success in bringing terminological information to the Semantic Web.
In the Tutorial we would like to discuss trends and novel models such as Lemon – the lexicon model for ontologies – to show possible future directions. The tutorial is targeted at researchers and practitioners interested in learning how to enrich ontologies with linguistic information in one or several natural languages and NLP tool developers interested in understanding how Semantic Web resources can be leveraged fro NLP. There will be two hands-on sessions in this tutorial.
Why did you choose to use PoolParty thesaurus management system in your tutorial?
To create terminology models on the web there are only few tools available which are often very technical and not straightforward to use for non-experts. We found that PoolParty in contrast to other SKOS editors has an attractive and usable interface. In addition the web based interface was preferable, as it did not require the participants to download software, the immediate publishing of linked data is more compatible with linked data principles and the tool has similarities to our own tools for working with lemon.
Thank you for this interview!
-
9:09 Seevl: Explore the cultural universe based on semantic web technologies
» The Semantic PuzzleJust recently Alexandre Passant from DERI Galway went public with a new web service called seevl. First impressions after test driving the system reveal that the seevl team is keeping the promises they have made: “Seevl reinvents music discovery. We provide new ways to explore the cultural and musical universe of your favorite artists and to discover new ones by understanding how they are connected. In addition, we let you comment every piece of data about them.”
I was talking with Alexandre and asked a couple of questions:
Q: seevl.net aims to offer a new way of music recommendations. What exactly can the user expect from it?
The main idea is to offer context around the recommendations, while existing systems are opaque, or rely on collaborative filtering techniques. So that a user know why he could / should like X if he’s browsing page about Y. We hope (and we’ve seen it from our user feedback so far) that it can help to discover new bands and hidden connections.Q: Yes, indeed this is something new. Maybe for the typical users this could be too complicated. This brilliant feature should somehow be hidden – working just like a magic button?
So far, we include this in the “why is related” button, but we’re constantly working on the UI / UX. Also, we only provide text for now, but are working on dataviz interfaces.Q: seevl offers for developers a Web API. It seems like you don´t use semantic web standards for that?
We use content-negotiation to provide machine-readable data for every page (search results, entity description, related artists, etc.). If by non-SW standards you mean non-RDF, indeed, we provide JSON instead of RDF/XML or N3, etc. But our JSON integrates URI that you can dereference and follows a similar approach than other existing RDF-JSON serialisation. So, why JSON you may ask. Because our developer target is music hackers, and all APIs from this community (last.fm, echonest, etc.) offer JSON, not RDF. Learning a new JSON schema takes 5 min, learning RDF takes much more.
But we believe that a JSON-RDF serialisation combines the best of both worlds. Actually, we could say we provide our data using standards (we’re giving back a graph that follows the RDF abstract model, with links to dereferencable URIS) but not in a (so far) standardised serialisation.Q: I agree. But mid-term oriented I would go additionally for SPARQL. A lot of people learn how to SPARQL at the moment.
Yes, we have to measure the cost / ROI. Complete SPARQL can lead to complex queries, that’s why they are somehow hidden behind our search interface (that basically construct a controlled SPARQL query). But that could be something provided to advanced customers.Q: seevl.net is based on linked data sets like DBpedia, MusicBrainz or Freebase. Is seevl itself offering Linked (Open) Data? I can also see heavy use of the open graph protocol. How could a facebook application of seevl could look like?
Yes, we provide our data back at http://developers.seevl.net. We’re using the Music Ontology and a bit of other models (FOAF, etc.). So far, the OGP markup is used for Facebook likes – but we are looking at other things that could be built on top of this.Q: Which business model are you following? Can one integrate your service into his shop? would you offer this a cloud service? for how much?
We’ll have B2C (new features on the website are coming soon) and a B2B freemium model. We’re currently identifying how much calls we can support as part of the free-calls per day (so that will indeed be cloud-based, our architecture is on EC2). So, integration of our service / data in shop websites, etc. is definitely what we’d like to see and to feature in our upcoming app-gallery ! The only requirement for data-reuse is attribution and linking-back to the service.Thanks Alex, and I wish you and your team all the best with seevl.net!
-
14:50 Which kind of controlled vocabularies matter?
» The Semantic PuzzleLooking at intermediate results of the Controlled Vocabularies Survey an interesting finding concerns the question which types of knowledge models are currently best fit for actual use in applications.
So far 143 people whose organization already make use of controlled vocabularies answered the question “Which kind of controlled vocabulary do you use or plan to use in your applications?”.
The results so far show that lightweight models like taxonomies and thesauri are somewhat preferred over ontologies:Taxonomies are the favorite, as 73.6% of participants use or plan to use them, followed by thesauri (62%) and ontologies (61.2%), while simple glossaries lag considerably behind with a usage of 31.4%.
This survey will close in about a week, so please take this chance to make your opinions on this topic count! You can find the questions here, it will take 5-10 minutes to answer them.
All participants will gain access to a report with the results within the following month. The most interesting results will be made public on this blog.
-
8:06 I-SEMANTICS 2011 – Extended Submission Deadline
» The Semantic Puzzle
This is the final call for papers for I-SEMANTICS 2011. Due to several requests we decided to extend the deadline of the I-SEMANTICS Conference till Monday, May 30 , 2011.The new dates are:
Extended Submision Deadline: May 30, 2011
Notification of Acceptance: June 27, 2011
Camera Ready Paper: July 18, 2011I-SEMANTICS 2011 (www.i-semantics.at) is the 7th conference in the I-SEMANTICS series and takes place from September 7 – 9, 2011 in Graz / Austria. I-SEMANTICS brings together both researchers and practitioners in the areas of Linked Data, Social Software and the Semantic Web in order to present and develop innovative ideas that help realising the “Social Semantic Web” and the “Corporate Semantic Web”.
I-SEMANTICS 2011 will be the host of the 6th AIS SigPrag International Conference on Pragmatic Web as well as the 4th edition of the TRIPLIFICATION Challenge. Further on I-SEMANTICS will be complemented by I-KNOW (www.i-know.at), the 11th International Conference on Knowledge Management. This setup is aiming to reflect the increasing importance and convergence of knowledge management and semantic systems.
The scientific track invites long and short papers along the main topics “Linked Data and Web of Data” to “Semantic Web Applications and Application Building Blocks, Studies, Metrics & Benchmarks”. The papers will be published in the ACM ICPS series. The detailed CfP containing all scientific tracks can be found here: [i-semantics.tugraz.at]
To address the needs and interests of industry the i-Praxis track invites enterprises and public organisations to present industry relevant solutions in the field of semantic technologies. Presenters will be granted free access to the conference and will have generous time slots to present their applications. The presentations will be published on the conference website. Please find more information here: [i-semantics.tugraz.at]
-
17:15 Semantic Web and Emerging Trends in Scholarly Publishing
» The Semantic PuzzleIn my capacity as one of the Editors-in-chief of the Semantic Web journal (the other one is Krzysztof Janowicz; the journal is published by IOS Press), I was recently invited to talk about the journal at Allen Press’ Seminar Emerging Trends in Scholarly Publishing. This seminar is an annual event which draws decision makers from the scholarly publishing industry to hear about and discuss recent developments and hot topics related to their profession. This year’s event had a session on “Semantic Enrichment”, and one on “Rethinking the Structure of Peer Review.” All presentations, including videos, are available from the Allen Press website.
The invited speaker of the “Semantic Enrichment” session was Pam Harley, Vice President, Product & Market Development of Semedica, a division of Silverchair. Pam gave a high-level account of the possibilities and added value which comes with Semantic Enrichment, in a way suitable for the non-technical audience. I personally benefited particularly from the large variety of reasons for adopting Semantic Technologies in publishing which she presented and discussed in her talk (see also her slides).
My presentation (see also the slides) about the Semantic Web journal was part of the “Rethinking the Structure of Peer Review” session, and was focused on the open and transparent review process which we have adopted for the journal. After the presentation, throughout the event, I received ample feedback and remarks which in particular commended us for setting up a realistic improvement of the review process while avoiding radical changes which are likely to meet too much resistance from researchers. I certainly agree with this assessment. The presentation also contains a bit of information on how the journal is doing (in short: it’s doing great).
The seminar was a very enjoyable experience. In particular, it was enlightening to learn about publisher’s perspectives on scientific publishing, reviewing processes, and emerging revenue models. It was also nice to see that Semantic Web as a technology has a natural place in these discussions and is seeing more and more adoption in practice.
If you’re curious to learn more, have a look at the videos of the presentations.
[Author: Pascal Hitzler]
-
7:23 The hype, the hope and the LOD2: Sören Auer engaded in the next generation LOD
» The Semantic PuzzleThe paneuropean Project LOD2 is one of the biggest projects dealing with linked data. Scientists, programmers and software architects in various european countries are working on the next generation of linked open data. In a series of interviews i’m presenting people working on and with LOD2. As a start, i had the change to talk to Sören Auer, head of the LOD2 project.
Thomas Thurner: Over the recent years the LOD movement gained tremendous momentum. As one of the key players in this area how do you perceive this development? Hype or hope?
Sören Auer: From my point of view the momentum LOD gained is deserved. We should strive for a Web, which is more decentralized, democratic, participatory, transparent and inclusive. Linked Open Data is from my point a key technological building block on this road. However, a lot of work is ahead of us. LOD has to find its way directly into mainstream technology such as CMSes, Search Engines, Web Applications, Mash-Ups and we have to show users and stakeholders the direct added-value of this technology.Thomas Thurner: What is the current state of the LOD cloud from a technological point of view? Where do you see room for improvement?
Sören Auer: Currently, the technological state of LOD seems to be comparable to the early days of the Web. We are still able to draw maps/clouds of the LOD datasets and data links are still sparse and difficult to maintain. This reminds me a lot of the early days of the Web, where we also had problems with broken links (the infamous 404). Later, after content management systems and Web applications automatized the link generation and maintenance this improved a lot and I hope we are on the same road with LOD technologies finding its way into more and more Web systems.
Thomas Thurner: How is the LOD2 project addressing theses issues? What are the project’s key objectives?
Sören Auer: LOD2 is addressing in three ways: First, we develop new research approaches highly relevant for LOD, for example, for Linked Data management, automatic data linking as well as Linked Data enrichment andquality improvement. Second, we implement and integrate these approaches into specialized tools (e.g. SILK, OntoWiki, Virtuoso and DL-Learner) forming together the integrated LOD2 stack. The LOD2 stack can be used by data publishers for the whole life-cycle of Linked Data management ranging from extraction over linking, authoring, enrichment to exploration & search.
Thomas Thurner: What do you think are the most important factors to bring LOD to the masses?
Sören Auer: From my point of view the key factor here is that we manage to integrate the large number of tools and approaches for supporting the Linked Datalife-cycle stages in a synergistic way, where each aspect adds value and triggers a number of other improvements. For example, the establishing of a new data link has a direct effect on search & exploration of Linked Data. We have to directly show these kind of benefits to users so they receive and instant gratification for contributions to the Web of Data. Semantic Wikis, such as Semantic MediaWiki and OntoWiki, are already nicely working in this direction. An application with an enormous potential to bring LOD to the masses would be the creation of a distributed, social semantic network. With OpenId, WebId, FOAF, Semantic Pingback most of the building blocks are available, but the final step integrating these into an easy-to-use social networking application still has to be done.Thomas Thurner: Compared to other semantic web approaches linked data principles seem to be rather easy to understand. On the other hand some argue that the “linked data cloud” is a big heap of data which cannot be used for professional purposes. What is your point of view?
Sören Auer: Of course the currently available data is not useful for all potential usage scenarios. However, already now Linked Data can be used for many interesting applications: For example, we just completed the development of a prototype for a large search engine, where users searching are assisted with comprehensive background information obtained from the Linked Data Web. For this use case, information available as Linked Data is already very valuable and useful. The criticism of LOD being a “heap of data” also reminds me a lot of the early days of the Web, where people raised similar criticisms for the Web being a medium of un-professionalism. Later it turned out that, of course there is a lot of amateurism, but as Wikipedia impressively demonstrates the working together of many amateurs with the right tools can in the end outperform few professionals.
Thomas Thurner: Linked Data could also become a new paradigm for light-weight enterprise data integration. What are the biggest obstacles today for linked data to being accepted by the business community?
Sören Auer: Using Linked Data for data integration in large enterprises has an enormous potential. Just last week I was invited for a workshop with the IT department of one of the top car makers and the people responsible there for data integration were extremely excited about the opportunities of Linked Data in the large heterogeneous enterprise with more than 3000 different backend systems. Linked Data technologies can easily fill the gap between unstructured Intranet search and expensive & complicated Service-oriented Architectures. Compared to SOA, Linked Data is a pay-as-you-go strategy, where data integration can be performed incementally and in sync with the requirements and evolution of the data structures in the enterprise. In order to realize this vision, we need to continue the maturation of enterprise Linked Data tools – the availability of PoolParty, Sindice Enterprise Edition, Virtuoso, TopBraid are already important steps in that direction.
Thomas Thurner: Automatic mechanisms to curate linked data and to make alignments between datasets possible play a crucial role for the next phase of linked data economics. Which technologies will play a central role? What will be the most critical point – do you see a “wisdom of the crowd” playing a role in this game?
Sören Auer: Definitely! Tapping the wisdom of the crowd for mapping & linking has a huge potential, which is currently unused. We started working in that direction with DBpedia Live and the DBpedia mapping Wiki. In order, to make it really easy for people to contribute we have to dramatically lower the barrier to contributing to the alignment process. In LOD2 we also plan to enable users to create mapping and links between dataset by simply giving examples of correct links and evaluating some automatically generated ones.Thomas Thurner: At the moment governments all around the world start to publish open data, more and more stakeholders start to understand the benefit of open linked data. On the other hand enterprises haven´t even started with this topic. What could be the dynamics which will trigger projects in industry sectors like financial industries which will make use of open data principles?
Sören Auer: Making statistical and financial information available in structured form and as Linked Data could have a enormous impact in this regard. With the DataCube vocabulary effort a first step in this direction was made, but it would be nice if this vocabulary would get an official stamp of a standardization organization such as W3C. Since the benefit of publishing statistical and financial data in structured form, e.g. as Linked Data, is visible most when done by many, this could be also facilitated by government regulations and industry best-practices.
About INFAI
The Institute for Applied Computer Science (InfAI) at Universität Leipzig hosts research groups in service sciences, knowledge engineering and management as well as natural language processing. The approximately 20 researchers of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at InfAI headed by Dr. Sören Auer are establishing theoretical results and scalable implementations for the field. Particular emphasis is given to areas such as ontology creation and
manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web. The implemented tools and services (such as DBpedia, OntoWiki, DL-Learner and LinkedGeoData) developed by the group enjoy considerable popularity.About Sören Auer
Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig. His research interests include semantic data web technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems. He aims to combine strong theoretical results with high-impact practical applications. Sören is author of over 50 peer-reviewed scientific publications resulting in a Hirsch index of 15. Sören is leading the large-scale integrated EU-FP7-ICT research project “LOD2 – Creating Knowledge out of Interlinked Data”. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia or the social Semantic Web toolkit OntoWiki. He is co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.
-
8:52 Controlled vocabularies: “Data integration is king”
» The Semantic PuzzleJust recently a survey about “Controlled vocabularies” and their significance for enterprise information management has started. Until today 143 participants have responded and completed the survey at least partially. To give a first example what was found out, I would like to take a closer at the question: What are the main application areas of controlled vocabularies from your perspective?
A bit surprising is the intermediate result, that it´s not “Semantic Search” or “Support of multilingual applications” which was considered to be the most important application. Instead of this it turned out that “Data Integration” is king:

The bar graph shows the weighed value of each application candidate (1.0 would be a 100% acceptance that this is an important application area of controlled vocabularies). Regarding the top candidate “data integration”- 57,4% said “very important”
- 29,8% “relevant”
- 7,4% “somewhat relevant”
- 2,1% “not relevant”
- 3,2% “Don´t know”
If you don´t think this should be the final result, please help to get a better overview of what´s going on in the controlled vocabulary community. The survey is open until May 18th, 2011 – all participants will gain access to a report with the results within the following month. Most interesting results will be made public on this blog.
-
12:59 Linked data based thesaurus management in collaborative settings
» The Semantic PuzzleThe creation and management of controlled vocabularies in companies often takes place in a distributed manner. Different departments in different branch offices often rather create their own vocabularies, than have one large central knowledge model, where everyone contributes.
How to model divergent views on one concept?
Such a central model is not only much harder to manage, but there is also the general problem that differerent departments like marketing, quality assurance, R&D, etc. will have divergent views on the model and its concepts. These different perspectives on one and the same concept are hard to unify in a single model.
Think of a company that sells mobile phones and wants to create a model of its line of products. It wants to utilize this model in the context of its online shop as well as in the context of its user support forum. While the structure of the model (i.e. the relationships between the products) might be very similar or the same in both contexts, there will be differences in which properties of the products are actually relevant in the respective contexts.
In the model of the marketing department there might be a concept for a “Phantastax StamiMaxx” cell phone with a definiton “The StamiMaxx has a powerful battery and is great for professionals who travel a lot”. They might relate it to manufacturer “ACME Corporation” and to several concepts representing different features like “Android OS”, “Multi-touch touchscreen”, etc.
The very same phone has different properties that are interesting from the Quality Assurance departement’s perspective. They might call it by a more specific name like “Phantastax i3000 StamiMaxx S”, have a different definition for it like “3G cell phone implementing the new WTF3000 protocol, …” and relate it to concepts representing known problems and their solutions.Now they face the task to integrate these different models, as it is not desirable to use a bunch of isolated models within one company.
Support of collaborative work on distributed models
To support this kind of collaborative work on distributed knowledge models, we would like to link the concepts of the models, just as is we link documents in the World Wide Web. Fortunately the Simple Knowledge Organisation System (SKOS) offers mapping properties that can be used to define relationships between concepts from different knowledge models.
E.g. when we want to say that concept “Phantastax StamiMaxx” in the product line thesaurus refers to the same real world entity as concept “Phantastax i3000 StamiMaxx S” in the Quality Assurance thesaurus, then we can use skos:exactMatch to express that. If we want to express that the concepts are merly similar, skos:closeMatch could be used.
The other SKOS mapping properties express a hierarchical (narrowMatch, broadMatch) or an associative (relatedMatch) mapping relation between concepts from different concept schemes. With those we can say that my Samsung Galaxy concept has a skos:broadMatch “Smartphone” in the product line vocabulary and a skos:relatedMatch “ACME Corporation” in a controlled vocabulary about Tech companies.
Modularisation of knowledge models
In this way SKOS thesaurus management systems like PoolParty make it possible to modularise knowledge models, represent concepts in their different contexts and consequently enable collaborative work on those models: The marketing guy can work on his model with the concept properties focused on sales without disrupting the work of the quality assurance expert on her own thesaurus. Later one or both of them can create the skos:exactMatch link between the concepts that are the same, like seen in the “Exact Matching Concepts” box in screenshot of PoolParty below.
Enrich your knowledge: Get connected with the LOD Cloud
Going a step further the models could be connected to external knowledge, e.g. a source from the Linked Open Data (LOD) Cloud. Once we establish links to LOD hubs like DBpedia, we can import additional information for their concepts or use it to establish whether similar concepts from different models really refer to the same real world resource.
-
9:48 REMINDER: I-Semantics 2011 — Call for Papers
» The Semantic PuzzleI-SEMANTICS 2011 (www.i-semantics.at) is the 7th conference in the I-SEMANTICS series and takes place from September 7 – 9, 2011 in Graz / Austria. I-SEMANTICS brings together both researchers and practitioners in the areas of Linked Data, Social Software and the Semantic Web in order to present and develop innovative ideas that help realising the “Social Semantic Web” and the “Corporate Semantic Web”.
I-SEMANTICS 2011 will be the host of the 6th AIS SigPrag International Conference on Pragmatic Web as well as the 4th edition of the TRIPLIFICATION Challenge. Further on I-SEMANTICS will be complemented by I-KNOW (www.i-know.at), the 11th International Conference on Knowledge Management. This setup is aiming to reflect the increasing importance and convergence of knowledge management and semantic systems.
The scientific track invites long and short papers along the main topics “Linked Data and Web of Data” to “Semantic Web Applications and Application Building Blocks, Studies, Metrics & Benchmarks”. The papers will be published in the ACM ICPS series. The detailed CfP containing all scientific tracks can be found here: [i-semantics.tugraz.at]
To address the needs and interests of industry the i-Praxis track invites enterprises and public organisations to present industry relevant solutions in the field of semantic technologies. Presenters will be granted free access to the conference and will have generous time slots to present their applications. The presentations will be published on the conference website. Please find more information here: [i-semantics.tugraz.at]
Important Dates:
- Submission Deadline: April 30, 2011
- Notification of Acceptance: May 30, 2011
- Camera-Ready Paper: June 30, 2011
- I-SEMANTICS 2011: September 7 – 9, 2011
-
14:18 Florian Bauer: I like to view “linked data” as a “single worldwide API”
» The Semantic Puzzle
Florian Bauer is REEEP’s Operations and IT Director, responsible for the overall operational management of the organisation, the product management of reegle (the search engine for renewable energy and energy efficiency) and the management of the IT landscape of REEEP.PoolParty Team had the chance to talk with Florian about reegle – information gateway on clean energy.
Could you please give us a brief overview over reegle – what are the targets you are pursuing with this platform?
The main aim of the reegle information gateway ( [www.reegle.info] ) is to provide a one-stop gateway to comprehensive, high-quality and up-to-date information on clean energy. By making this information accessible to stakeholders in the field around the world, and by presenting it in a user-friendly and intuitive format, reegle directly helps to facilitate the transition to low-carbon energy.
The website provides information on renewable energy, energy efficiency and climate change and their various sub-sectors at a global level, and some reegle services actually combine raw data sets from several different sources, put these datasets into context and thus provide enriched information.
reegle is an offshoot of the Renewable Energy & Energy Efficiency Partnership (REEEP), a non-profit, specialist change agent aiming to catalyze the market for renewable energy and energy efficiency, with a primary focus on emerging markets and developing countries.
The new reegle data portal (data.reegle.info), launched in 2011, has established reegle as a publisher and consumer of Linked Open Data in the energy sector. It provides key clean energy datasets free for re-use using Linked Open Data W3C standards.
reegle consists of two components: one is the semantic search engine ( [www.reegle.info] ), the other is the linked data portal ( [data.reegle.info] ) – What are your target groups, and which typical problems of the clean energy domain can you solve with these services?
For reegle.info, our target groups are primarily project developers, financiers and government policy-makers. These users can access high-quality information on clean energy-related issues with the set of tools we provide: a special web search, a catalogue of more than 1700 key stakeholders, a map view for geographical browsing, a clean energy glossary, and an energy country profiles function.
The energy country profiles are typical of what we’re trying to achieve. Here, we take information from many different providers and combine it all to present one comprehensive information dossier on renewable energy and energy efficiency in that particular country. This means that in one location you have the country’s most important energy-related information ranging from key statistics, and current regulations to key players in the energy field in both public and private sectors.
For our data portal, the target group is a more technical one: primarily IT developers and open data specialists who want to create new mash-ups and integrate data from reegle into other websites. One of the first using these reegle data sets is the OpenEI.org website, another key portal in the energy field.
Open data is not the same as linked open data. Why did you choose to build your services around W3C´s linked data paradigm and/or standards like RDF?
Tim Berners-Lee once mentioned that he likes to compare the progressive ways of offering data with the “stars system” used to rate hotels. You get:
* for making data public (in any format)
** for machine-readable formats (structured data)
*** if the data is offered in a non-proprietary format
**** if you use URIs to identify things, so people can point to your datasets
***** for linking to other people’s data to provide contextSo, as you can imagine, our goal is for reegle to be firmly in the 5-star category, and to establish reegle as an avant-garde tool in energy data.
I also like to view “linked data” as a “single worldwide API”. If the old web was like a huge book, the new semantic web is like a huge database, and SPARQL is the way to ask for information – by sending a query through the SPARQL Endpoint. RDF is the language that offers all possibilities to describe a given dataset with all of the necessary information, including any links to other datasets. Therefore RDF data and SPARQL endpoints provide a powerful tool to find and filter datasets and are crucial, base parts of the semantic web’s architectural layers. On reegle the SPARQL endpoint and the description of the structure of our RDF files is online on our clean energy open data portal.You also decided to build a SKOS based domain thesaurus for clean energy which now plays an important role to improve the search experience at reegle.
Which experiences have you gained so far from this effort? Which obstacles did you have to overcome?The SKOS-based renewable energy thesaurus can be seen as the “heart” of reegle as it provides the basis for a lot of related services in reegle, including the refinement suggestions for search results, the auto-completion options and the glossary links between defined terms and their synonyms and related terms.
We decided to use SKOS because we think it is the best language for building a formal and controlled vocabulary for thesauri in a semantic web context, without adding too much complexity. Although it is a simple language, you really still need IT experts to use it to build a thesaurus – domain experts with additional IT skills (hard to find!).
So in our case, we decided to use a scalable and easy-to-use thesaurus server called “PoolParty”. Using this system drastically reduced the complexity, and allowed us to concentrate on the actual building of the thesaurus with our domain experts, and to spend less time on transferring the knowledge into data sets.
What are your future plans with reegle?
Currently we’re working on restructuring the site to better highlight our new added-value services such as the clean energy country profiles. We are also planning to further develop our thesaurus to include climate-compatible development terms and we’ll soon release a wordpress plug-in to insert this thesaurus into clean energy blogs. One of the most exciting projects we are actually working on is the development of “dossier pages”, where we will provide relevant information to several topics mashed up on one page using semantic web technologies. This is part of the EU funded SCMS (“semantic content management system”) project.
-
11:00 I-SEMANTICS 2011 — Call for Papers
» The Semantic PuzzleI-SEMANTICS 2011 (www.i-semantics.at) is the 7th conference in the I-SEMANTICS series and takes place from September 7 – 9, 2011 in Graz / Austria. I-SEMANTICS brings together both researchers and practitioners in the areas of Linked Data, Social Software and the Semantic Web in order to present and develop innovative ideas that help realising the “Social Semantic Web” and the “Corporate Semantic Web”.
I-SEMANTICS 2011 will be the host of the 6th AIS SigPrag International Conference on Pragmatic Web as well as the 4th edition of the TRIPLIFICATION Challenge. Further on I-SEMANTICS will be complemented by I-KNOW (www.i-know.at), the 11th International Conference on Knowledge Management. This setup is aiming to reflect the increasing importance and convergence of knowledge management and semantic systems.
The scientific track invites long and short papers along the main topics “Linked Data and Web of Data” to “Semantic Web Applications and Application Building Blocks, Studies, Metrics & Benchmarks”. The papers will be published in the ACM ICPS series. The detailed CfP containing all scientific tracks can be found here: [i-semantics.tugraz.at]
To address the needs and interests of industry the i-Praxis track invites enterprises and public organisations to present industry relevant solutions in the field of semantic technologies. Presenters will be granted free access to the conference and will have sufficient time to present their applications. The presentations will be published on the conference website. Please find more information here: [i-semantics.tugraz.at]
Important Dates:
- Submission Deadline: April 30, 2011
- Notification of Acceptance: May 30, 2011
- Camera-Ready Paper: June 30, 2011
- I-SEMANTICS 2011: September 7 – 9, 2011
-
18:33 Hjalmar Gislason: “What I call the emerging field of Data Market.”
» The Semantic Puzzle
Semantic Puzzle: What’s the business idea behind datamarket.com? Whom do you expect to pay for what? Hjalmar Gislason: From the end-user perspective its easiest to describe datamarket.com as a search engine for statistical data, a “Google for statistics” if you will. Any data that is already available open and for free out there will still be open and free on DataMarket, just easier to find, use, compare and download from a single source. While the audience for a search engine for statistical content is obviously way smaller than for text content, a significant part of that audience is business users, looking for data for business reasons. This means that there are more direct and lucrative methods to monetize the usage than simply contextual ads – especially in reselling access to premium data. This is a market that already turns over billions of dollars annually, but is as far from any of the “2.0 world” as one could possibly imagine (think Bloomberg, Reuters, FactSet). We believe there is an opportunity to disrupt a part of their business with a freemium approach, and furthermore open up the data market by reaching a business audience outside the narrowly defined financial user base that these companies cater to. There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting people and businesses to the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.
Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.
Semantic Puzzle: Can you tell me a bit about the technological framework behind datamarket.com? How is the content from third parties is feeded into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future? Hjalmar Gislason: The backend system is written in Python. We read data from the sources in various different formats, ranging from Excel files and even scraping of web pages to proprietary APIs and Web Services. The data is then stored in a normalized format in a Postgres database that we’re using in a pretty unique way to be able to efficiently store the billions of time series and fact values that the system will eventually hold (currently at around 100 million time series and 600 million fact values). The web site is also written in Python, using the Django framework, but also making use of a lot of javascript libraries (and a bunch of our own code) to allow for an exciting user experience. We’re currently using a Flash-based solution called amCharts for the charts, but have already taken some steps to replace that with our own solution that we’ve written on top of the excellent Protovis visualization library. While you are right that the export formats we provide for end users are XLS, CSV and images (for exporting the graphs), our REST-ful API actually supports XML and JSON formats as well. So we already provide data as XML.
Semantic Puzzle: As you for sure know Tim Berners-Lee’s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework? Hjalmar Gislason: Any fact value, time series and data set on DataMarket is “addressable” with a direct URL using our API. In that sense, all the data on DataMarket is four-star data according to Berners-Lee’s definition. In many cases we’re integrating to data that is only one or two star data, so just by integrating it into our system we’ve moved it a few notches up that ladder. In some cases we’ve even been helping organizations publishing data for the first time, taking the data from 0 to 4 stars in one go. We’ve been toying around with several ideas that would take – or enable users to take – the data all the way to 5-star status, but that’s still just on the drawing table.
Semantic Puzzle: You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is your service a “commercial replacement” for such a public effort? Hjalmar Gislason: There is no government-operated data portal in Iceland, and to my knowledge there are no plans for implementing one yet. Sadly there are several more pressing issues in terms of eGovernment here that take higher priority. We don’t see our efforts as a replacement for such a portal, but we have managed to fulfill a little part of that role when it comes to statistical data. We’ve also been really vocal about the benefits of open data and among other things been influential in launching an open data wiki - opingogn.net (Icelandic only) – that exmplains the concepts with examples and use cases and attempts to list in a directory listing as many sources of government data as possible. There is some movement, but as an open data enthusiast I’d really like to see things happening faster. As a matter of fact I think there are reasons for Iceland to be extra enthusiastic about open data to increase transparency and restore trust after the crash of the banks and the economic system in 2008.
Semantic Puzzle: A lot of commercial Open Data Services (Socrata, Factual, Google …) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business? Hjalmar Gislason: I’ve been writing quite a lot up on the developments in this industry on our blog. One of the things I’ve written the most about is what I call the Emerging field of Data Market“. I define “data markets” as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format.” Many of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I believe we’ll see many of them try to differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.
The interviewee: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise. -
15:39 Vienna Semantic Web Meetup – the next season
» The Semantic PuzzleStarted mid 2009, Vienna Semantic Web Meetup (VSWM) goes now in it’s third year. Hosted by various partners, from media to culture and from corporate to academic, this regular gathering now counts over 200 members. As it is a good tradition at VSWM, people from abroad are visiting by, giving input and new insights. Also the next season of VSWM will bring this mixture of international connection and informal meeting in putting two upcoming topics onto the agenda.
Digital Identity on the Semantic Web
Thursday, April 7, 2011While recent developments in ICT make it easier for companies and consumers to reach each other, they can also scatter your personal information more widely, making life easier for criminals. On the other hand public institutions and government agencies are collecting personal data too. So personal data is processed without the consensus (or even the knowledge) of the respective citizen. As we know, leaks in this field may unleash sensible personal data as well. The misuse of personal data can be restricted – this is a challenge to both, the technological and the juridical domain. This meetup takes a look on how Semantic Web Technologies can take over its responsibility in this emerging field.
- Christof Tschohl (BIM)
Ludwig Boltzmann Institute for Human Rights - Mischa Tuffield (Garlik)
A Standards-based, Open and Privacy-aware Social Web (W3C)
>> read more, and register for free
Portals, Apps and Visualizations for Open Government Data
Wednesday, June 15, 2011Picking up Keith Andrews suggestion, this is a MeetUp focusing on tools, services and projects dealing with Visualization, Apps-creation and Portals/Catalogs for Open [Government] Data. As this MeetUp is on the eve of Austrians first Open Government Data – Conference (OGD2011) we expect to meet experts ans enthusiasts from Austria and abroad.
- Keith Andrews (IICM)
Institute for Information Processing and Computer Supported New Media at Graz University of Technology - Andreas Blumauer (SWC)
Storing, searching, serving Open Government Data – getting an overview on the growing market for open data solutions
- Christof Tschohl (BIM)
-
14:08 Transforming spreadsheets into SKOS with Google Refine
» The Semantic PuzzleLooking for high quality enterprise vocabularies we recently turned our attention to the Global Industry Classification Standard (GICS), which is an industry taxonomy designed to categorize any private company. It was developed by Morgan Stanley Capital International and Standard & Poor’s and is mainly used by the global financial community to aid in the investment research process.
It is available for download as .xls spreadsheet files in several languages. Of course it would be much better to have this valuable taxonomy in a standard and machine-readable format. The Simple Knowledge Organization System SKOS is a perfect fit for a taxonomy like GICS. But how to turn a spreadsheet into SKOS with minimal manual effort?
I chose to try Google Refine for this task, as recently a promising RDF extension had been released by DERI‘s Fadi Maali and Richard Cyganiak.
Google Refine is “a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases”. Previously it was known as Freebase Gridworks which is now further developed by Google since its acquisition of Metaweb.

Google Refine UI
Refine is a very useful tool to filter and consequently transform rows, colums and cells according to customizable patterns.
After applying all necessary transformations to the spreadsheet one can edit the “RDF Skeleton”, where the columns can be mapped to literals, RDF properties and RDF classes (which can be imported from their namespaces).

Editing the RDF Sekeleton
Once you got your valid SKOS model ready you can export it in RDF/XML or Turtle format. Then you may want to load it into an ontology editor like Protégé or a thesaurus management tool like PoolParty in order to build upon it or connect it to other knowledge models. With PoolParty the GICS taxonomy can also be utilized to tag and categorize documents, provide semantic search and facetted navigation and it can be published as Linked Data without further effort.

GICS loaded in PoolParty
Working with Refine and its RDF extension was easy and fun. It’s even possible to isolate and save the transformation steps done with Refine, so one can re-apply them on similar structured spreadsheets. This came in very handy as GICS is published in nine languages and as many separate, identically structured spreadsheets.
-
14:59 Drupal and the Semantic Web – Interview with Stéphane Corlosquet
» The Semantic PuzzleStéphane Corlosquet has been the main driving force in incorporating Semantic Web capabilities into Drupal. In the recent release of Drupal 7, Semantic Web technologies became part of the core of this popular CMS, which is used to power at least 1% of all the world’s web sites.
Drupal is the leading CMS when it comes to implementing Semantic Web standards. What are the reasons for this, what makes Drupal such a good fit for Semantic Web technologies?
Historically, Drupal is known to be web standard compliant. It supported the RDF-based aggregation format known as RSS 1.0 as early as in 2001, which was later upgraded to RSS 2.0. The Drupal community prides itself in valid HTML code, not only for the code generated by Drupal, but also by taking the extra step of automatically fixing faulty HTML entered by its users. Drupal has been using XHTML since its version 4.0 in 2002. The next logical step beyond XHTML was to add a layer of semantics with the RDFa standard, a W3C recommendation published in 2008.
There are definitely many reasons that contributed to the addition of RDFa into Drupal 7. The first comes from the Drupal project lead, Dries Buytaert, who is passionate about the web and open source. Secondly, the growing Drupal community is very web savvy and includes many experts from different backgrounds in accessilibity, CSS, HTML, security etc. As a result, every release of Drupal includes many latest standards. The community meets twice a year at conferences (DrupalCons), thes events play a great role in hashing out what technologies or designs will be incorporated into the next version of Drupal. Because of the flexibility of its internal architecture, Drupal is able to keep up with the latest web standards. Content in Drupal is very structured and provides site administrators with a user interface to build the site structure they want, using entity types, content types, fields and taxonomies for categorization. When it comes to other CMSs, Joomla!’s community appears to be more fragmented with a core software that is not as extensible as Drupal and WordPress is more of a blogging platform, so turning it into a full blown CMS can be challenging. Both WordPress and Joomla! are in fact adapting the concept of Drupal’s Content Construction Kit (CCK) to their software but they have not yet reached the same level of maturity as Drupal.
A common objection to the adoption of Semantic Web technologies is that the learning curve is steep and that it is too complicated for many web developers to get into it. How can Drupal 7 change that? Which features accessible for the average web site operator will it offer?
Semantic Web technologies don’t have to be complicated when applied to simple use cases! We purposely chose only of a subset of semantic web technologies to integrate into the core of Drupal, keeping the learning curve for the Drupal developers and users as low as possible. The main technology is RDFa which includes the notions of vocabularies (a schema, or collection of attributes) as well as Compact URIs (CURIEs) which make the authoring of RDFa easier. In fact, some web developers might have come across these notions before when working with Dublin Core in the meta tags as such dc:title or dc:date.
Which benefits will web site owners get when they switch to a semantics enabled Drupal 7?
Google and Bing increasingly rely on machine-readable structured data from the websites that they crawl. The design of Drupal 7 embeds semantic meta data that makes machine-to-machine (M2M) search native for a Drupal 7 website. RDFa can add value by giving search engines more details such as the latitude and longitude of a venue for display on a map; or providing the ISO date format for localization and proper display in the search results for different countries.
What are your hopes regarding the development of other applications that either provide or consume data from D7 sites? Which improvements of standards, best practices or (lightweight) ontologies in the Semantic Web community would you like to see?
Services like Sig.ma are already able to collect semantic data from different sources and display it in new ways in the form of mash-ups. Eventually, these services that consume semantic data will not be just Drupal specific, as more platforms jump on the semantic web band wagon. What I hope to see as improvements or best practices in the future are more well-maintained vocabularies. Many of the existing vocabularies are over engineered, some fail to de-reference properly. Their is also some work to be done in order to improve the tooling made available to web developers as well as introducing the simple concepts of Linked Data to web developers via easy to read documentation.
Thank you for this interview, Stéphane!
-
8:38 EU-Report on the requirements for a paneuropean Open Government Data Portal
» The Semantic PuzzleThe recently published report on a hearing of an experts in Luxembourg this November, provides a snap-shoot on the discussion if a central open data infrastructure may make sense. The experts group list several positive effects like union-wide comparability of some government data set, as well as the role of being motor for national and regional initiatives. It is stressed several times, that a swift progress, in coming those plans reality, is crucial for success.
Read more at: Report – Technical workshop on the goals and requirements for a pan-European data portal
-
13:29 Open Intranet
» The Semantic PuzzleThe following blog post was used by Andreas Blumauer as a basis for a talk at TEDxVienna on Monday, November 29, 2010:

Open Data, Open Government, Open Source, Open Innovation – “Open” everywhere. Today I want to talk about another “Open something”: The “Open Intranet”. This might sound a bit radical but it will also help to reflect a little bit on the term “open” in general.“Open Intranet” – isn´t this a contradiction by definition? What is understood by “Intranet”? It means a network of computers and users “within” some organisational boundaries. But boundaries don´t necessarily have to be closed as nature teaches us: Organisms aren´t closed systems. A watch would be an example for a closed system but living organisms tend to be open – to survive. Of course they aren´t totally open, in systems theory we are talking about systems which are structurally coupled with their medium when we refer to this special kind of openness. As an example, an immune system, having learned to recognise a class of virus it will remain sensitive to that and similar viruses in future. In contrast to this, imagine a fly walking over a painting of Rembrandt: Since the fly isn´t structurally coupled to the cultural space of human aesthetics it is not “open” to the beauty of Rembrandt´s work.
When we think of today´s intranets, we can see that they tend to be isolated from the world wide web, they don´t seem to perceive the internet as their medium. From a user perspective, those two systems aren´t connected to each other. Typically, when working on the intranet we jump from time to time to be in the “internet mode” and start to Google something, we copy it, jump back and paste it into the intranet. It´s the user who is the only part of the whole system connecting the internet with the intranet. Isn´t this exhausting for us?
And now I start with the good news: Intranets all over the world start to open up, slowly – but they do. It seems like the “pressure” from “the outside” just became too huge. In the first instance it seems that it´s not the data and the information which will “break” in, it´s rather the “cool functions” which web apps offer and which we (as digital natives) would like to have in our intranets too. We want:
- better search,
- more possibilities to interact with information,
- integrated views instead of jumping around,
- and we want more possibilities to self-serve our extensive hunger for more and well structured information.
On the information level intranets are still rather conservative: Typical pieces of information already “injected” from the web into an average intranet would be:
- weather forecasts,
- stock exchange rates,
- time zones and
- jokes.
How could companies use the web to inspire their employees (without opening up totally), how could the web “inject” the right amount of information into an intranet to make an enterprise portal as vivid as the web is being perceived by today´s typical end-user. How could this tremendous amount of data and knowledge on the web be “structurally” coupled with intranet repositories and workflows? What are the advantages a company could gain from publishing (at least some) data on the web?
Open Intranet View more presentations from ABLVienna.Let me give you a few examples for intranet apps which have started to consume other information than jokes from the web:
- Enterprise Mashups: Combine CRM systems with social networks like LinkedIn
- Open innovation: Let´s bring the knowledge of consumers and producers together and improve certain products and services. As an example, just recently after BP´s oil spill more than 40.000 people came up with ideas on how to clean up the oil, more than two dozen were deployed to help clean up the oil
- Content Augmentation: Enrich content which is being edited, let´s say in an enterprise wiki, automatically with some background knowledge from Wikipedia or with news from a news company
Finally I will also give you two examples for use cases where companies expose and publish internal data on the web (without violating privacy) and benefit from it.
- Wisdom of the crowd: The Canadian gold mining group Goldcorp made 400 megabytes of geological survey data available to the public over the Internet. They offered over $500,000 to anyone who could analyze the data and suggest places where gold could be found. The company claims that the contest produced 110 targets, 8 million ounces of gold, worth more than $3 billion.
- Prize economics: Netflix, a movie rental service in the US has published data for a contest to improve their recommender engine. One team out of 50.000 contestants after nearly 3 years has improved the existing recommender engine by more than 10% and won 1 Million dollar
To end with a conclusion: What Tim Berners-Lee has demanded in one of his famous TED talks was “raw data now!”. It has started to become reality. Just think of all the “Open Government Data Initiatives” around the globe which were initiated since then. Now companies with a “Web DNA” have started to understand the value of open data and to contribute their “5 cents” to the global “open data cloud”. I think this will not only be of value for many companies but also will increase tremendously the chances to resolve some global problems in the near future.
-
6:35 data.reegle.info – Linked Open Data on Clean Energy
» The Semantic PuzzleFollowing the worldwide trend of Open Government Data as well as Linked (Open) Data the reegle.info team has decided to launch a reegle data portal in November 2010: data.reegle.info.
The idea of providing raw data (first mentioned by Sir Tim Berners Lee in the course of the W3C Linked (Open) Data movement) for free and unrestricted re-use follows the idea and objectives of the reegle.info information system as the single point of access for worldwide clean energy data (renewable energy as well as energy efficiency).On data.reegle.info you can find data on stakeholders in the clean energy area as well as country (energy) profiles from the 1st day of the launch in November 2010 – later on the reegle.info team will open up its renewable energy and energy efficiency thesaurus (SKOS format) for public re-use and continuously will open up and provide more and more clean energy data on data.reegle.info. As license for data.reegle.info the Open Government Data License for public sector information is used. data.reegle.info follows W3C standards and recommendations for Linked Open Data as well as Open Government Data.
For developers data.reegle.info have created a comprehensive developer guide as well as a SPARQL endpoint as the central API to the reegle.info data. So the the reegle.info consortium hopes that data.reegle.info initiates a lot of new (data) mash ups as well as innovative apps using data.reegle.info.
-
7:38 Les Kneebone: “Semantic web technologies are one solution to linking education data in Australia”
» The Semantic PuzzleLes Kneebone is Project Manager at Education Services Australia Ltd.
Among other projects he is responsible for Schools Online Thesaurus (ScOT).PoolParty Team asked Les a couple of questions about thesaurus management, linked data and the semantic web. Here is a short summary of this interview:
Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?
A thesaurus approach was chosen rather than a subject headings approach because we assumed (and continue to assume) that post-coordinate indexing will drive vocabulary-assisted discovery.
Which role does SKOS and/or Linked Data play in order to achieve your goals?
ScOT concepts are now published as URIs. This approach solves the problem of different ScOT versions in disparate systems.
What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?
The Achievement Standards Network (ASN) provides a model for profiling curriculum statements and linking those statements to education resources using various rdf vocabularies. By profiling curriculum statements to learning resources, more precise matching is achieved.
What are the most important arguments to use Semantic Web standards and linked data, especially in education?
The Australian education sector is characterized by many disparate systems in different education jurisdictions. Semantic web technologies are one solution to linking education data in Australia.
Why did you choose PoolParty to manage your thesauri?
We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as a our new thesaurus management tool.
What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work? (Do you plan to publish ScOT on the LOD cloud? Under which licenses?)
Our vocabularies are currently for non-commercial use and we don’t anticipate any change to the license at this stage. The ScOT license requires attribution, permits derivatives that must be shared, and is for non-commercial use.
-
18:58 Reasonable Minutes from ISWC2010
» The Semantic PuzzleI find it quite clearly noticeable that ontology reasoning is slowly making its way into mainstream. I begin seeing more and more applications – and industry investigations – picking up ontology reasoning in a matter-of-fact way. It seems that the bickering between scientists whether ontology reasoning is needed and/or useful is simply ignored when it’s about applications. And I very much welcome this. The “why” question is no longer important. In fact, even the “how” question isn’t. It’s being used – although sometimes perhaps not in an entirely conscious way, or in a way in which traditional reasoning applications would have been set up. And I very much welcome this as well.
I’m not talking about the fact that 2 out of 3 shortlisted papers for the best paper award at ISWC2010 are reasoning papers (which continues an established trend) – the winner has not been announced yet, there’s one day of the conference still ahead. Rather, I found it noticeable that reasoning prominently popped up in the first in-use-track session (and that wasn’t artificially arranged – in fact the first session was on life sciences applications). Another, less obvious case in point was the excellent keynote given by Evan Sandhaus on how nytimes.com utilizes semantic technologies. Among other things, they used GeoNames for inferring that news from Rome are also news from Italy, and they used Freebase for equating different identifiers for entities (in this case, politicians). Both of these were not explicitly executed or identified as reasoning steps, but this is only a matter of algorithmization. Conceptually, this is ontology reasoning at its simple best: The derivation of implicit knowledge by automated deductive means is reasoning, whether you are aware of it or not.
Talking about applications – Tania Tudorache from Stanford Biomedical presented the ongoing work on ICD-11, which centrally utilizes WebProtege and OWL. I think that this work is completely underappreciated by the Semantic Web community, perhaps because they are not aware of the impact of this. The ICD classification of diseases is the world-wide manual for medical diagnostics, which means that Semantic Technologies – in a rather invisible manner, as it should be – result in something which will be used by millions of physicians world-wide in their everyday work life. That’s what I call dissemination into practice!
By the way – in the context of such trends, it strikes me as oddly outdated to hear panel comments like “OWL still needs to show its worth – what can it do what you cannot do with rules?” It’s about time we stop bickering and pushing our pet paradigms and simply make things work and improve. (And no, I didn’t bother to comment during the panel. A discussion like this is futile, and I think more and more people are realizing this now anyway.)
Another keynote, by mc schraefel, very nicely also put applications into perspective. And highlighted some of the shortcomings of the currently hyped Linked Data. (Don’t get me wrong – Linked Data is extremely necessary for the Semantic Web on several accounts, but there are indeed lot of issues with it which we need to face.) Interestingly, the reactions I heard were mixed – but in an unexpected way. On the one hand, there was wide positive reaction that this was an excellent keynote with a very important message (which is also my take). On the other hand, I heard voices saying that we already know these and other problems with Linked Data, so there wasn’t really any useful content in the talk. I’m rather happy, though, that I didn’t hear anybody disagree with the general message.
Another very notable presentation, as part of the Semantic Web Challenge, was by Deborah McGuinness, on the data.gov work at RPI. The scope of dissemination is simply impressive, and another milestone in the making of the Semantic Web.
What else? The Semantic Web journal’s first Editorial Board meeting took place at the conference (the first issue will be out shortly). My showcase volume of our book was not stolen this time. And there were a considerable number of very interesting-looking papers in the reasoning sessions – all of which I regretfully missed because I was tied up in parallel events. I’m looking forward to reading the papers, though.
On the culinary side, I have to say that I was a bit disappointed. Actually, the food was very good, but at previous conferences I’ve visited in China, it was much more exotic (from a European perspective, anyway) – perhaps the reason for this was that these other events I’ve been to were mainly Chinese, with only a few international guests. And, certainly, the cuisine was not at all as bad as the internet connection at the conference center. But we’ve already become very accustomed to having Semantic Web conferences with too little bandwidth, so it’s kind of expected anyway.
[Author: Pascal Hitzler]
-
16:56 Paneuropean Open Government Data Survey – join now!
» The Semantic PuzzleLOD2 project is currently circulating a survey aimed at people interested in open government data. If you are interested in government information (whether as a publisher, producer, reuser or consumer) LOD2 – team would be very grateful for 10-15 minutes of your time to let them know about what you would like to see from the technology developed by LOD2.

You can find the survey at survey.lod2.eu
The survey will be open until the 17th December 2010.Very much appreciated, is any help in forwarding this to relevant colleagues or suggestions for people this should be to, and for any blogging/tweeting to make sure as many potentially interested people as possible have the opportunity to respond! If you have any questions or issues about the survey please don’t hesitate to contact Martin Kaltenböck <m.kaltenboeck –at– semantic-web.at> or Thomas Thurner <t.thurner–at– semantic-web.at>.
-
9:37 Open World Assumption revisited – What have the Semantic Web and Document Management in common?
» The Semantic PuzzleJust recently I visited DMS Expo in Stuttgart/Germany which claims to be “Europe’s leading trade fair and conference for enterprise content, output and document management”. It was a large trade show, and of course I didn´t expect to see the Semantic Web playing a central role there but on the other side it became much clearer what´s still missing in most enterprises of today to be an “Enterprise x.0″: Open Minds which consider digital contents as a source to create knowledge out of it.
What´s obvious for most of the Semantic Web evangelists, isn´t clear for at least 75% of all exhibitors (and their clients) at DMS Expo. For these people who are dealing with core systems of today´s enterprise stacks it´s not quite clear that documents could be a valuable resource for enterprise knowledge management. They still focus rather on the basic idea that documents have to be revision-proof, they should be long-term archived and should be put into a safe. That´s quite the opposite of how content is organised in a (Corporate) Semantic Web. In such an environment each little piece of information at least has the potential to get linked with another piece of information.
Open World Assumption is not only about the way we put ontologies in place.
It is also about the basic assumption that people intend to get their content published and linked in a way that this creates an extra value for their colleagues and their organisations.
Documents are containers, and containers tend to be put into containers which are even bigger. In a world where documents are the atomic elements to get information organised the question always is: What should be in there?
On the Semantic Web information is no longer locked inside documents, same with wikis: The idea is to organise every little piece of information in a way that it can improve constantly because it´s out on the (corporate) web. Part of this evolution are mechanisms which help to get pieces linked in a meaningful way. On the Social Semantic Web this job is partly automatically executed and partly done by human beings. In this world which is based on the assumption that people would like to have their information out there on the stage the question always is: How can this piece of information get linked to other pieces in a meaningful way? Which metadata should be put on top?
It´s the people who make the difference, and for many of them there is still no “business use case” based on the “Open World Assumption”.
Here is my proposal: Appreciating each one´s work as a valuable resource for the whole organisation!
And I can hear the question already: Great, but how can I put this into my Excel?
-
13:06 KiWi Software Package Released – Call for KiWi Snow Camp
» The Semantic PuzzleThe 14th of October 2010 was a very special date for the KiWi project: After more than two and a half years of development version 1.0 of the semantic collaborative knowledge management software was published. To celebrate that, the project organized a release party in the planetarium in Vienna, Austria. It was a fine evening that featured speeches of Ross Gardler (Vice President Community, Apache Software Foundation) and David Ayers (Free Software Foundation Europe), followed by a demonstration of KiWi by Sebastian Schaffert (KiWi Project Lead). KiWi, the Open Source development platform for building Semantic Social Media Applications, offers features required for Social Media applications such as versioning, (semantic) tagging, rich text editing, easy linking, rating and commenting, as well as advanced “smart” services such as recommendations, rule-based reasoning, information extraction, intelligent search and querying, a sophisticated social reputation system, vocabulary management, and rich visualisation.
To make sure, that KiWi does not die, after the closure of the EC-funded periode, the project makes effort to form a community. The release party was thus also an opportunity to get in touch with the project team. Another opportunity to get in touch with the Software and it’s developers behind is in February next year. When KiWi Snow Camp will gonna be somewhere in the Salzburg mountains.
The KiWi projects sponsors ticktes to participate in the camp for all those- which have a good idea on how semantic technologies can make social media hit the target?
- and are inspired by the possibilities of the KiWi platform?
Together with the KiWi Team participants will meet in February 2011 in Salzburg’s mountains to develop ideas, programm, discuss and develop amazing new pieces of code – and of course enjoy the skiing experience. Not to mention receive the glory of recognition from others in the open source communities and within the broader semantic web community.
How to get my trip to the KiWi Snow Camp?
You will need to register as a participant for the KiWi Developer Challenge. Please email kiwimail@kiwi-community.eu to register your intention to participate in the Challenge; if you are not already registered on KiWi Community site, please do so and include a brief biography.
Visit the KiWi Snow Camp page for more details…
-
21:17 Marrying ARML with Linked Data
» The Semantic PuzzleFirst of all, since ARML (augmented reality markup language) is based on KML and KML uses „Placemarks“ (which all have corresponding identifiers) as basic entities, these could be identified quite easily via URIs within the W3C Resource Description Framework (RDF).
Another basic concept of KML is „Point“. Geo RDF provides properties like „geo:long“ or „geo:lat“ which express longitude and latitude of a POI and thus makes it possible to uniquely identify certain points on a map using RDF standards.
Thus it is possible to map the geo conventions of ARML to the geo conventions of the Semantic Web which are mainly based on Geo RDF.
As soon as a placemark has received a URI it is also possible to expose it as linked data and interlink it with repositories like Geonames, DBpedia or LinkedGeoData (which is based on Open Street Map) to generate Linked Geodata.
ARML makes it possible to link / make a relation between a „Provider“ and a „Placemark“. Thus it is also possible to use a URI to describe a provider and link it to a placemark using the typical triple-struture imminent to RDF.
OpenARML/Wikitude uses tags to describe certain things. These tags are currently represented as literals (strings), seperated by commas. This poses that obstacle that these tags can hardly be processed by machines. With RDF each tag would be assigned a URI, thus changing it from a literal to a resource, which further can be represented in SKOS/RDF, another Semantic Web specification of the W3C.
ARML/Wikitude also offers attributes to describe POIs like phone, URL, email, attachment etc. which all of them could be represented by Semantic Web defacto standards like FOAF, SIOC etc.
Summing up, ARML/Wikitude documents could relatively easily be transformed in valid RDF / Linked Data Graphs. This could help to enrich AR-applications with data from the LOD (Linked Open Data) cloud. Vice versa data generated by ARML applications could be exposed as Linked Data.
As a pragmatic approach we recommend to generate on top of existing Wikipedia URLs the corresponding DBpedia URIs which would directly transform ARML placemarks into a resource as part of the existing LOD cloud.
As soon as placemarks are mapped to DBpedia additional metadata could be added to a placemark which opens up totally new perspectives on content enrichment in ARML environments enabling new and exciting AR-applications.
We want to thank Martin Lechner from Salzburg based Mobilizy for a fruitful discussion we had so far on this topic.
BTW: check out the paper by Reynolds et al. (2010) from DERI on “Exploiting Linked Open Data for Mobile Augmented Reality“
-
8:30 I-Semantics 2010 Review
» The Semantic PuzzleVery nice conference review of I-Semantics 2010 from Dan Leahu’s point of view: http://danleahu.com/series/isemantics/
Thanks Dean for the flowers & credits!
-
7:46 Open Government Data around the World
» The Semantic PuzzleMy colleague Thomas Thurner has put some Open Government Data initiatives on a map to get an overview what is happening around the world in this area. If you have any additional initiatives to add, please feel free to contact Thomas (office @ semantic-web DOT at) to get access – he will support you in this!
This is an important step to collect initiatives worldwide so we can follow our objective to build a network of OpenGov initiatives around the world in the course of building a focal point for the LOD topic – many thanks for your support!
(This initiative is part of the LOD2 project which was recently started in Leipzig.)
-
12:28 LOD2 Kick Off Meeting in Leipzig
» The Semantic PuzzleFrom September 6 – 8, 2010 we kicked off the LOD2 project in Leipzig / Germany. LOD2 is funded by the European Commission within the 7th Framework Programme (Grant Agreement No. 257943) consisting of 10 partners from 7 countries. Its main aim is to integrate and syndicate linked data with large-scale, existing applications and showcase the benefits in three application scenarios: 1) Media & Publishing, 2) Enterprise Data Management and 3) Open Government Data. The resulting tools, methods and data sets have the potential to change the Web as we know it today. (You can download the project flyer here.)
The first day was dedicated to the general introduction of the project partners which are Universität Leipzig (Germany), Centrum Wiskunde & Informatica (Netherlands), National University of Ireland in Galway (Ireland), Freie Universität Berlin (Germany), OpenLink Software (United Kingdom), Semantic Web Company (Austria), TenForce (Belgium), Exalead (France), Wolters Kluwer Deutschland (Germany) and Open Knowledge Foundation (United Kingdom). Below you see a picture of the kick off team.
During the morning of the second day a first introduction to the technical components took place. The picture below shows an abstraction of the LOD2 high level architecture.
Orri Erling and Hugh Williams from OpenLink introduced Virtuoso, which will be used as one of the storage technologies in the LOD2 stack. The second knowledge store technology will be MonetDB introduced by Peter Boncz from CWI. Both systems will also be used as a kind of benchmark laboratory for hosting and querying linked data.
Christian Bizer from FU Berlin talked about Silk and D2R. In combination they will be used to discover relationship and similarities between entities within different linked data sources – generally called identity resolution.
Giovanni Tummarello from DERI introduced Sindice and Sig.ma under the aspect of how to update, validate and reuse data that is available on the web and support the production of professional, collaboratively governed linked data especially for enterprise use. Beside that an important aspect will be how to handle the high amounts of generated data. So according to Giovanni scaling the infrastructure and the use of appropriate hardware will be central in bringing the Sindice index into enterprise stacks i.e. as an approach for lightweight data consolidation purposes.
Norman Heino from AKSW University of Leipzig introduced OntoWiki and Semantic Pingback. Ontowiki will be used at the interface layer for producing, annotating, browsing and querying linked data and presenting it to the enduser in various GUIs. Semantic Pingback’s aim is to interlink the Web 2.0 with the Semantic Web by backwards compatible RPCs (remote procedure calls). It detects new typed or untyped external links, manages the GET and POST commands and it takes care of server autodiscovery.
Andreas Blumauer from Semantic Web Company demonstrated PoolParty as a smart editor for metadata in enterprise stacks. Like Ontowiki PoolParty also addresses the interface level of LOD2 especially when it comes to generate, edit and link metadata to documents primarily based on SKOS. PoolParty deliberatelly uses Thesauri as a mapping layer to discover similarities of documents, generate tag recommendations for their annotation and publish used vocabularies as Linked Data.
In the afternoon we continued with individual breakout sessions to discuss work package interdependencies and start profiling the use cases and requirements eingineering in more detail.
The third day started with an introduction by Stefano Bertolo – the responsible scientific project officer from the EC side for the LOD2 project – who pointed out that the LOD2 project is an important one for the European Web of Data and the EC among others specially is interested in the Open Government Data use case of LOD2.
After this introduction talks of the 3 Use Cases were presented by A) Jonathan Gray (OKFN) about the Open Gov Data use case followd by B) Amar-Djalil MEZAOUR (Exalead) speaking about the Linked Business Data use case and C) Christian Dirschl (Wolters Kluwer) having a talk about the LOD in the publishing & media industry use case.
Central to the success of LOD2 will be a smart handling of all the integration issues which will come up in the course of the project. Here Tenforce, an integration specialist from Belgium, will have the lead. CEO Bastiaan Deblieck gave a detailed outlook on the methodologies and he presented a nice and comprehensive overview how the integration issues will be approached from a SCRUM perspective.
After a presentation about LOD2 project dissemination, training and community building activities by Martin Kaltenböck (Semantic Web Company) there were serveral discussions going on until the successful kick off meeting was closed by project lead Sören Auer (Universität Leipzig) at 04.00pm of 08 September 2010.
Updated news information can be accessed on the LOD2 project website as well as on the LOD2project twitter stream (and on twitter using #lod2)…
Stay tuned!
-
13:16 The review in a car
» The Semantic Puzzle
Imagine the following: A car full of Semantic Web Experts is on it’s way back from Graz. They hand around an iPhone to record some first impressions about the just ended 6th International Conference on Semantic Systems, I-SEMANTICS. So, the car was a Volvo, occupied by Thomas Schandl, Helmut Nagy, Tassilo Pellegrini and Andreas Blumauer.Andreas Blumauer: “I think this year’s I-Semantics was a big step forward. I had the impression that a lot of industry representatives are looking again for serious solutions there, after they have had already “burned their fingers” with the first-generation semantics. The now presented 2nd generation is much more about running applications and less unproven concepts.”
Tassilo Pellegini: “This is what I also noticed this year. People build now on a solid common knowledge on the topic and are much more aware of the possibilities of the existing technologies and methods. And as this conference also where visited by a quite international crowd, a very homogeneous discussion incorporating a lot of the international trends was possible. So the developed sight on the topic was quite clear. In this respect, the keynote of Peter A. Gloor was a notable and impressive look into the very next future. It seems that the powerful technique of Cool Farming will be on our agenda in the next years, when we talk about prognosis tools, sentiment analysis, aggregated expert’s data, etc.”
Andreas Blumauer: “In terms of a look into the very next trends, also the Keynote of Rafael Sidi was impressive to me, as he draw a real amazing picture how his company Elsevier is on the way to transform their whole business model into a new paradigm. And this gives a glue that LOD has now arrived in real industry environments.”
Tom Schandl: “I think this real-live-aspect of the Semantic Web was one of the unspoken focal points of the conference. In this respect Richard Cyganiak had a brilliant talk about how corporate data integration can benefit from RDF-Solutions, because a RDF based data concept can be developed step-by-step in contradiction to a “conservative corporate data integration” which always goes with a general redesign of the whole data-structure of a company. Richard calls this “pay as you go” – and I think this is what the industry looks for.”
Helmut Nagy: “This is also my impression, standing a lot on our booth. The industry looks for very concrete semantic solutions – and some of them are already there and ready to use. So – to carry some house advertising – our PoolParty demozone was very well recognised and commented. And this is not only because I served Tropical Banana Cocktail there.”
So the talk went on, in the car, at the blogosphere in the Semantic Web Community.
-
7:50 Winners of Triplification Challenge 2010
» The Semantic PuzzleOn Friday, September 3, 2010 the winners and honorary mentions of the 3rd Triplification Challenge have been awarded at the I-SEMANTICS conference in Graz. This year’s challenge consisted of an Open Track and a special Open Governement Data Track.
In total we received 28 submissions from which 15 nominees have been selected by the organizing committee. In a second round an international reviewing team of scientific and industrial experts elected the 3 equal winners and 3 honorary mentions. The winners were each granted a prize money of 1000.- Euro which was sponsored by Wolters Kluwer Germany, Semantic Universe and Semantic Web Company.
In the Open Governement Data Track the awards went to:
Winner:
Self-Service Linked Government Data with dcat and Gridworks
Richard Cyganiak, Fadi Maali and Vassilios PeristerasHonorary Mention:
Linking Open Government Data: What Journalists Wish They Had Known
Christoph Boehm, Felix Naumann, Markus Freitag, Stefan George, Norman Höfler, Martin Köppelmann, Claudia Lehmann, Andrina Mascher and Tobias SchmidtHonorary Mention:
Geographical Linked Data: a Spanish Use Case
Alexander De Leon, Victor Saquicela, Luis M. Vilches-Blázquez, Boris Villazón-Terrazas, Freddy Priyatna, Oscar Corcho, Carlos Buil, Jose Mora and Jean Paul CalbimonteIn the Open Track the awards went to:
Winner:
Live Open Linked Sensor Database
Danh Le Phuoc, Josiane Xavier Parreira, Michael Hausenblas, Yuanbo Han, Manfred HauswirthWinner:
Twarql: Tapping Into the Wisdom of the Crowd
Pablo Mendes, Pavan Kapanipathi and Alexandre PassantHonorary Mention:
BibBase Triplified – http://data.bibbase.org
Christian Fritz, Oktie Hassanzadeh, Yang Yang, Reynold Xin and Renée J. MillerThe winners f.l.t.r. Christian Dirschl (Sponsor Wolters Kluwer), Alex Passant, Richard Cyganiak, Danh LePhuoc and Pablo N. Mendes
Cordial congratulations from the organizing team & look out for the 4th Triplification Challenge in 2011, which will again take place at the I-SEMANTICS conference, September 7 – 9, 2011 in Graz / Austria.
-
4:44 Why SKOS thesauri matter – the next generation of semantic technologies
» The Semantic PuzzleAs a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.
Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.
A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.
STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.
This background knowledge is used in this demo app to improve the search for similar documents dramatically:
Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.
This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.
Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.
-
16:25 The Semantic Web journal – half a year later
» The Semantic Puzzle
The journal “Semantic Web – Interoperability, Usability, Applicability” – in short: the Semantic Web journal – was launched 7 months ago, sporting a transparent open review process. Pascal Hitzler is one of the Editors-in-Chief (the other one is Krzysztof Janowicz). He answers some questions on the motivation, setup, and future plans of the journal. (Pascal also wrote the questions and this intro, so it’s really a fake interview. But it seemed an appropriate literary form …)Question: Why did you launch yet another journal on Semantic Web?
Hitzler: Because the community is growing and the need for publication outlets grows with it. I heard the objection that there weren’t enough quality papers for all the journals, but I don’t think so. It’s just that most of the quality papers still end up in journals which are not dedicated to the Semantic Web as such.
Personally, my desire to start a new journal began when I wanted to do a special issue on Semantic Web reasoning in some other, established, journal, and the Editors-in-Chief basically replied with a lapidary “Is there anything to report?” I didn’t push the case back then (though I probably should have). But this and similar experiences made me think about scientific publishing from a different angle, a normative one: What should scientific publishing in our field look like? The journal gives me a possibility to realize some of my answers – or at least to go a few steps into the right direction. So when the opportunity arose to set up this journal with a well-known publishing house (IOS Press) and with a co-Editor-in-Chief (Krzysztof Janowicz, a strong proponent of open and transparent reviewing) who I knew would also put a maximum of energy into the venture, it was simply too good an opportunity to let it pass. However I also realize that the reality of scientific publishing can change only slowly, and that it needs time and gradual improvements. We can’t do it all at once.
Question: Your journal uses an open review process. What is that and why?
Hitzler: Open reviewing, in the sense we use it for the Semantic Web journal, is all about transparency. Submitted papers are made publicly available. Solicited reviews are made publicly available. Anybody else can additionally contribute a public review. Reviewers are publicly known by name. Discussions between reviewers and authors can (and should) happen in public. Reviewers and editors are acknowledged by name in the published versions of the papers.
The obvious reason for setting up an open review process is to improve the quality of the decision-making process. We have to realize that some persisting habits about reviewing have their origin in times when scientific publishing was made for a small expert audience, and had to be conducted by sending manuscripts and letters by conventional mail. Today, however, reviewing and publishing is inflationary, which substantially reduces the quality of the typical paper – and of the typical review. While we cannot simply reverse this trend, we can take advantage of the World Wide Web to counteract these developments and improve quality by bringing the review process out into the public space. Reviewers will put more effort into providing constructive reviews if they publicly sign their reviews. Open and public discussions on controversial submissions minimize errors in the decision making.
Personally, I also hope that the ensuing discussions will help to bring back a scientific tradition which has long been on the decline in our field: controversial but constructive discussion. Regretfully, these days we somehow tend to mainly present incremental results, bash opposing opinions, and sugarcoat our own …
Question: Past attempts to set up open reviewing for journals have failed …
Hitzler: Yes, I remember seeing some of these early attempts many years ago when I was a PhD student. Even back then I was doubtful if the sometimes rather radical setups had a chance. In the meantime, there is growing experience in other fields that open reviews can work out if set up carefully. In our case, we mix old-style with open, by still soliciting reviews, and by giving solicited reviewers the option to stay anonymous, if they see a need for this protection. We Editors-in-Chief also “steer” the journal in the sense that we have rather clear strategic targets, e.g. in terms of scope and quality, which we’re trying to meet. In short: rather than experimenting with radical changes, we mildly introduce a new but essential component – open reviewing – in a traditional scientific publishing process. That way, it will work.
Question: But isn’t anonymous reviewing necessary to protect the reviewers and in order to get objectively critical reviews?
Hitzler: Sometimes. That’s why it’s good that solicited reviewers can opt to stay anonymous. Open reviewing – like any form of assessment in science – isn’t perfect, and has its drawbacks. However, the current reality in Computer Science is that reviewing processes are often extremely poor and decision processes are not very transparent. For conferences, reviewer discussions and rebuttal phases were introduced some time ago to improve the decision making. Open reviews simply go a step further.
Question: Aren’t potential authors afraid of getting a public bashing in the review process?
Hitzler: Reviewers typically won’t bash if they sign with their name. And in fact, we monitor the reviews in order to make sure that they adhere to a certain minimal scientific standard. At the same time, it’s probably just as well if our public process makes people more reluctant to submit papers which are not yet mature enough for publication. We wouldn’t want to publish them anyway. And in order to protect authors of rejected submissions, we actually remove the corresponding papers and reviews from the website after some time.
While I understand that some people may be more reluctant to put their work out in the open before it’s been accepted through a review process, we have to be aware that many quality journal publications, like the ones we’re striving for, are extended versions of high-quality conference publications: so they have indeed already been through a review process. Furthermore, submitting to our journal gives added visibility for the work, since it’s up for public review on our website.
Question: Your journal also publishes papers which are not standard research papers. Aren’t you compromising scientific rigor by doing this?
Hitzler: Times are changing. The prime purpose of a scientific journal is to disseminate results to other researchers, and to do so through a quality filter. Traditionally, this dissemination was restricted to focused research contributions, targeted at other researchers working in the same narrow area as the author(s). Semantic Web as a field, however, is extremely diverse and comprises researchers and practitioners from many other communities. Consequently, high-quality tools, systems, ontologies, introductory surveys and application reports are very much needed for the dissemination of advances in our field to all interested parties. As for research papers, the role of the journal for these other types of papers is primarily quality assurance. And consequently, we have clearly formulated the evaluation criteria for different types of papers. A report on a high impact tool, for example, is thus not a direct research contribution in the traditional sense. But if the tool enables further developments in the field, then it is worth reporting, and it indirectly makes a contribution to scientific progress.
Question: Why are you still publishing through a commercial publishing house?
Hitzler: Because it helps. A lot. It’s easy to underestimate the amount of work which needs to be put into running a journal, and going with a commercial publisher rids the Editors-in-Chief and the Editorial Board from a lot of tasks which are not directly related with quality assurance. Open review does not mean that this kind of professional support is no longer needed. And we are glad that we have found a publishing house which is very accommodating to our ideas.
Question: What are plans for the immediate future?
Hitzler: We currently have more than 30 papers up for review, most of them responses to two recent calls, one on tools and systems papers, and one on applications of OWL – and some of the submissions seem rather prominent. We also have several special issues lined up, most of them have not been announced yet. The first issue will appear towards the end of the year and contain vision statements by the EB members – we do not normally publish vision statements, but this seemed an appropriate way to introduce the journal. Considering that the journal has been launched only 7 months ago, this means that we are already very well under way in pursuing our goal of establishing a high-quality scientific outlet in the field.
[author: Pascal Hitzler]
-
10:47 What if the biggest web company bought one of the central semantic web players?
» The Semantic PuzzleWell, exactly this happened yesterday: Google bought Metaweb – provider of Freebase. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like DBpedia or New York Times. For example: Google´s page on Freebase offers a rich source for machine-readable facts around this company.
What does this mean to the Semantic Web Community which has been working on a smarter web in the last decade?
Well, a lot… First of all, it´s good to hear that Google will continue to develop Freebase as a free and open database to everyone, saying “… we would be delighted if other web companies use and contribute to the data.”Until yesterday still a lot of companies were not fully convinced if the Semantic Web will play a central role in the further development of the Internet. Now the game has changed. The entity-driven approach to develop web applications has just started now:
We will keep on reporting and discussing how Google will influence the development of the Semantic Web – and if I had a wish for free: Please add RDF(a) to the Freebase widgets!
Semantic Web -
8:50 I-Semantics 2010: Relevance of semantic technologies for industry increases fast
» The Semantic Puzzle
I-Semantics will take place for the 6th time this year in September and it will be co-located again with I-Know in Graz/Austria. This year´s programme shows that Semantic Web and semantic technologies in general are increasingly relevant for all kind of industries:
- Biomedicine
- Public administration & Public transport
- Information technology
- Libraries
- Media & Content Industry
- E-commerce
- Education etc.

I-Semantics “Industry Track” with its 3-days programme full of demos is one of the highlights of the congress. With 28 submissions this year´s Triplification Challenge tells a lot about the significance of Linked Data in areas like librarianship, public administration or GIS & environmental planning. Take a look at the 15 nominees – and if you consider to come to I-Semantics 2010 follow the link for registration.
-
14:38 Report on developments at the European Semantic Technology Market
» The Semantic Puzzle
The present state of development, future trends and expected market scenarios for Semantic Technologies are shown in the just published “Demand driven Mapping Report”. The report is part of the EU-funded project Value It, which is about bringing together the various stakeholders within the sector: Industry, Research and Government. VALUE-IT preliminary findings show that the STE potential market in Europe will size up to €1.44B for 2014. Scanning furthermore the executive summary of the report, some findings attract attention:The survey results also show considerable variation by sector, both of policy and technology implementation. With respect to technologies, ICT companies are also the most willing to consider semantic approaches. The ICT sector has an unusually high interest in all ST components, with 20% or more being willing to consider all of them, and over half of IT respondents looking at Web 2.0 (social computing). [...] The use of tagging technologies – which overall is the least mature approach in the survey – is most advanced in Life Sciences. The Life Sciences, Media & Entertainment, and ICT sectors all have a reasonably strong interest in Natural Language Processing (roughly 25% on average). Ontologies and RDF/OWL are the technologies least often considered, though the interest in these Semantic Technologies is not insignificant. Taxonomies are slightly more popular, perhaps indicating that companies are taking the first step to prepare for a more semantic approach to IT solutions. The ICT, Energy & Utilities, and Media & Entertainment sectors all have a reasonably strong interest in using taxonomies.
The 190 pages report gives an actual overview of the status quo on European Semantic Technology Market and is now available for download: Final demand driven mapping Report
-
11:22 Vienna 01.07.2010 – Panel discussion on the Future Internet
» The Semantic Puzzle
Within the last year the SWC’s team run the project called “ZukunftsWeb” (Future Internet). After ten month of in-deep discussion, expert panels, webinars and the becoming of a book on the topic, it’s time to celebrate the past efforts and have also a look into the future. So this is why we want invite friendly to our evening event on july the first. So if you are in vienna that day, join us – we promise a inspiring evening, with nice people and wise talks.Venue: Filmmuseum Wien
Date/time: 01.07.2010 / 6pm -
6:00 Stella Dextre Clarke & Alan Gilchrist about the “Future of Knowledge Organization on the Web”
» The Semantic PuzzleSemantic Web Company (SWC) had the pleasure and the opportunity to talk with two internationally recognised experts in the fields of information management and knowledge organization: Alan Gilchrist and Stella Dextre Clarke. SWC asked some questions about the “Future of Knowledge Organization on the Web & Linked Data” on the occasion of an event of the same name organised by ISKO UK which will take place on September 14, 2010 in London.

1. Alan, you are one of the leading experts in the field of thesaurus construction. Organising knowledge in a (worldwide) Semantic Web is a rather young discipline compared to your domain. What do you think can the Semantic Web community learn from “traditional” thesaurus management and vice versa?
You put inverted commas round the word traditional, but it might be more appropriate to put them round the word thesaurus! So long as words are used in information retrieval and in information sharing, different forms of structured vocabularies will be required, and many of the fundamental principles of thesaurus construction are still valid for their construction. Of course, the “traditional” thesaurus has mutated since the days when it was used only for controlled indexing and retrieval; and now, with the many enrichments possible it can be viewed as an ontology (in one of the definitions of this word). What remains a difficulty is to create a generalisable typology of associative relationships, though this is, of course, possible in relatively closed systems. In short, structured vocabularies with broadly thesaurus formats will be a necessary component in the web stack.
2. Stella, as a consultant you are specialized in the design and implementation of knowledge structures for information retrieval applications. In the last few months we have seen that SKOS can serve as a significant building block to link “traditional” thesaurus management to knowledge structures from the semantic web. Can you see that this development is market-driven, is there a significant growth of demand for solutions built around SKOS?
This question sounds surprisingly sceptical about the growth of SKOS. I guess the dizzying speed of phenomena like Facebook and Twitter has fuelled expectations of tools springing up overnight like mushrooms, fully formed and ready to eat. But actually it takes time, not just for the tools to be fashioned, but for the potential market to develop an understanding of what they can do and what will happen next when they are used.
Applications for SKOS are springing up all the time, as fast as people can grow the skills and vision to deploy them. At the moment the market, or shall we say the power-base, seems to be with the academic sector and allied not-for-profit organisations. This will spread progressively through the public to the private sector, as enterprises find ways of adapting their business models. The main hurdles to overcome could be intellectual property rights and the need for compilers of databases to keep earning their living.
3. Alan, constructing thesauri for the semantic web also means that one has to make the “open world assumption”. In which sense does this change the way to manage thesauri, keep them growing and assure quality? Can you see new, upcoming methodologies to do that?
Everything changes with the “open world assumption”! Following on from my answer to the previous question, it seems clear that one manifestation of the thesaurus will be found in those systems that support interoperability, such as federated searching or metadata registries. Even with simple thesaurus management software, it is possible to construct a “master vocabulary” or “word bank” to support different applications within an enterprise; thereby promoting interoperability. More sophisticated software is already available (though not very widely); more will be needed and, doubtless, will be created.
A more formal answer to both questions will be found in a new standard – ISO 25964, currently being prepared on the basis of BS 8723. The two fundamental features of these two standards are (1) the thesaurus as a theoretical and practical basis for the construction of structured vocabularies for information retieval and (2) the growing and vital need for interoperability between systems and the intelligent mapping of the vocabularies used by those systems.
4. Stella, just recently at ESWC 2010, Sean Bechhofer was asked during his keynote why there are so few SKOS tools on the market. What do you think are the reasons for this? Are there still shortcomings of the SKOS specification compared to other existing thesaurus standards? (see also: [www.eswc2010.org] & [www.slideshare.net] )
Regarding the speed of development, see my reply above. As to shortcomings, did you note in one of Bechhofer’s slides: “Standardisation is necessarily a compromise: Everyone equally unhappy = success!” The SKOS development team took a conscious decision to keep the schema sufficiently simple that it could be applicable to as many different types of KOS as possible. On the downside, this means SKOS is unsatisfactory for conveying sophisticated features of some thesauri and classification schemes. But by keeping the entry barrier low, more widespread use has been encouraged.
By way of illustration, compare SKOS with the data model and XML schema of BS 8723. This schema is comparatively specialized, with the aim of enabling exchange of any thesaurus carrying any or all of the features recommended in the standard. And incidentally, this data model and schema will have some further capabilities added when published in the forthcoming standard ISO 25964. SKOS does not provide for a number of features in these standards (such as compound equivalence). But the schemas in BS 8723 and ISO 25964 are designed for thesaurus developers to share their work, rather than for easy publication on the Web, and will never have so many users or associated tools as SKOS.
So I believe that SKOS has done well to accept compromises that encourage generalisation although they might not suit some specialists. That said, I do regret one of its weaknesses in the context of mapping. Compound equivalence mappings (that is to say, where Concept A in one vocabulary maps to a combination of Concepts B and C in another) are very commonly needed when extending a search across multiple databases, and the SKOS mapping properties do not currently allow for them. Perhaps there will be some provision in future?
5. Stella, Alan, in September ISKO UK will organise an event on “The Future of Knowledge Organisation on the Web”. “Linked Data” seems to be a promising approach to organise knowledge in large scale environments.
Could you imagine that SKOS as a small subset of semantic web specifications will play a central role in this environment since it is quite intuitively comprehensible by virtually any knowledge worker or do you rather think SKOS is too simple (or too complex)? (see also: [poolparty.punkt.at] )Stella: Of course SKOS will have a central role (whether or not every knowledge worker finds it as intuitive as you suppose). “Linked Data” will find even wider applicability. ISKO-UK (the organiser of the meeting in London on 14 September) has a mission not just to spread the word about both these technologies, but to build bridges between the several communities who must share their expertise and data to build more exciting applications. We’re expecting an audience of over 100 at this low-cost event.
Alan: Yes, of course, just as all the tools in the web stack will be necessary if semantic web technologies are to be effective. But it is obvious that we are dealing with complexities of a higher order than ever before. Any structured vocabulary is an “artificial language” which, while acknowledging many aspects of theoretical linguistics is forced to be pragmatic in its construction. Consequently, it would not be surprising if SKOS is seen to be “catching up”, and this became apparent in the work of BS 8723 when thesaurus models using UML were being constructed. There remains much work to be done on all fronts.
Stella Dextre Clarke is an independent consultant specializing in the design and implementation of thesauri and other knowledge organization structures. She currently leads ISO NP 25964, the project to update and revise the international standards for thesauri. Previously she was the Convenor of the Working Group which developed BS 8723. In 2006 she won the Tony Kent Strix Award for outstanding achievement in information retrieval, in recognition for her development work on IPSV (Integrated Public Sector Vocabulary), as well as on the vocabulary standards. She is a Fellow of the Chartered Institute of Library and Information Professionals.
Alan Gilchrist has been a consultant for many years in the fields of information management and information architecture, specialising in the vocabulary aspects of information retrieval. He is co-author, with Jean Aitchison and David Bawden of Thesaurus Construction and Use, now in its fourth edition. In 1979 he founded and edited the Journal of Information Science, and is now Editor Emeritus. He has an Honorary Degree (D. Litt.) from the University of Brighton and is an Honorary Fellow of the Chartered Institute of Librarians and Information Professionals.
-
13:23 Kingsley Idehen: “By declaring its context, Linked Data can be made more easily reusable by others”
» The Semantic Puzzle
Semantic Web Company talked with Kingsley Idehen who is CEO of OpenLink Software and probably one of the most profound experts on data integration issues about “Linked Data”. The interview covers questions like:
- How can Linked Data help to make companies more productive?
- Do you think that the Linked Data Initiative can build upon a stable architecture or will it face more and more problems the bigger the “cloud” will grow?
- What´s the ultimate argument for an Enterprise Architect to use languages like SPARQL at least in addition to SQL?
- How will a “Real Time Semantic Web” change the whole game?
- How will the “Semantic Web” be called in 10 years? Will there still be a “Semantic Web”?
Read the full version of the interview here.
-
10:25 Lyndon Nixon: “With the hundreds of TV channels available, content selection becomes a significant challenge for users.”
» The Semantic Puzzle
From June 9 – 11, 2010 the EuroITV Conference discusses latest advances and research of media technology, HCI, media studies, and the content creation community. Tassilo Pellegrini talked to Lyndon Nixon, STI International, about the future role of semantic technologies in the television industry and how a Social Semantic Web might influence the traditional television experience.
At this year’s EuroITV conference you will hold a workshop on the EU project NoTube. Can you give us a brief insight what this project is about?NoTube is all about the future of television! We are seeing a significant shift in viewing patterns driven by the Web, which breaks the linear programming model and makes TV or video on demand a reality, whether it is being provided directly by the broadcasters or via a third party like Hulu or YouTube. The Web-based model taken up by viewers using their PC is being transferred back to the TV set in the lounge by IPTV applications running on Set Top Boxes or Internet TVs which come with Web access built into them. The strong interaction between the desires of users and technology has had its impact on the Web and as the gap between the Web and TV experience grows, we aim to translate features of the Web to TV, such as the personalised and community aspects. The NoTube European project puts the TV user back in the driver’s seat by generating user profiles from data the user creates on the Social Web, and in this way facilitating a personalised TV experience without an intrusive user profiling process.
What promises does the Social Semantic Web hold with respect to innovate the television experience? What is the vision?With the hundreds of channels available via modern TV providers, content selection and dealing with the vast amount of TV-related information become significant challenges for users. TV metadata is created and distributed by a small group of people, as a result of the closed-source information exchange protocols that are the standard for providing electronic programme guide (EPG) data to users. Yet people often have several clusters of personal data on the Web, such as their profiles on social networks, or ratings of videos on YouTube and IMDB.
Analogously, there are many isolated clusters of broadcast data on the Web, such as broadcast data on EPGs and background information on Wikipedia. Within the NoTube vision context, we speculate that the conjunction of all these bits and pieces of data provide accurate information on someone’s interests, which is suitable for generating relevant recommendations on TV broadcasts. We see progress on opening up this data with open standards and APIs such as Google’s OpenSocial, Facebook’s OpenGraph, DBPedia, the BBC ontologies and FOAF. Further, we assume that Semantic Web technologies provide important building blocks for realizing this vision, as they enable the global identification mechanism of URIs and the means to define relations between data anywhere on the Web. By integrating these different pockets of data, we can provide TV viewers with personalised recommendations for their viewing.
What economic effects on the value chain do you expect from semantically empowered television? Will there be new revenue opportunities with respect to advertising or Pay TV models?Our primary focus is on open source and open standards, so for example we are extending the open source MythTV media centre to develop first scenarios of personalised EPGs. However, down the road there are clearly commercialisation opportunities.
Another scenario in the project looks at personalised advertising, which is clearly somewhere where there are revenue opportunities. However, we take user privacy very seriously, and one aspect we need to tackle in NoTube is the fine line between analysing user activity (in order to personalise their TV experience) and using that analysis commercially.
The third NoTube scenario involves pushing personalised news streams to TV viewers. Here, one could imagine that such a service could be packaged within a Pay TV offer, and used to give competitive advantage or justify a higher fee.
Despite many attempts experience has shown that television is a rather conservative and innovation-averse medium. What can be done to stimulate the uptake of semantic technologies in the television sector?That’s true; in the traditional broadcasting sector the larger companies are extremely slow to adopt new technologies. However, I think Web video and TV has really shook up the sector – traditional broadcasters are seeing that they lose viewer share to Web-based offers and have been quick to take their video material to the Web. There is a clear demand for this, look at the viewing numbers for BBC’s iPlayer in the UK for example.
IPTV also means that new applications and services can be built on top of traditional TV. I think once the broadcasters see the added value of offering applications and services tied into the content of their programming – such as through semantic analysis of the program metadata, which NoTube is doing – they will be encouraged to support better these efforts. The BBC is really taking a lead in this, publishing a lot of their data already in RDF.
Workshop InformationThe NoTube workshop on Future Television: integrating the Social and
Semantic Web will take place at the EuroITV 2010 conference in Tampere, Finland on June 9, 2010.
For more information please seeand
For more information about NoTube, please see
http://notube.tv and follow our blog, at http://blog.notu.be
About Lyndon NixonDr. Lyndon Nixon joined STI International as senior postdoctoral researcher in November 2008. Previously he was a researcher at the FU Berlin, where he acted as Industry Area Co-Manager of the EU Network of Excellence KnowledgeWeb and double Workpackage Leader in the EU project TripCom. In KnowledgeWeb, Dr. Nixon organized and led activities promoting the transfer of semantic technology to industry. He received his PhD in January 2007 with the topic ‘Semantic Web enabled Multimedia Presentation system’. His research focus is Web-based TV/video and the semantically guided integration of Web-based content, and he has several publications and has organized a number of workshops around related themes.
-
8:54 Adrian Pohl: “We believe the Semantic Web plays an important role for the future of libraries.”
» The Semantic Puzzle
In March 2010 several Cologne-based libraries have opened their catalogue data under a CC0 license following Tim Berners-Lee’s call for “Raw Data Now!”. What has been the motivation behind this step?
A group of Cologne-based libraries has taken a big step towards open data. In an concerted action they have relased their catalogue data for reuse on the web. Project manager Adrian Pohl comments on the initiative and what role the Semantic Web will play for libraries in the future.The hbz (“Hochschulbibliothekzentrum des Landes Nordrhein-Westfalen”, english: “North Rhine-Westphalian Library Service Centre”) has come to the conclusion that libraries need to participate in the development of the Semantic Web. The opening of catalog data followed as a necessary first step. Our intention is to show with this first legal-political step how important the legal/licensing dimension is when you publish data on the web, be it Linked Data or not. So for us at the hbz the Open Data initiative primarily is seen as the first step in eventually publishing Linked Open Data just as Tim Berners-Lee had called for.
Other participants in the Cologne Open Data initiative like the Cologne University and City Library focus more on the direct advantages the releasing of raw bibliographic data bings: With other libraries and consortia following this example it will be easy to enrich existing catalog or other bibliographic services with subject headings, classification numbers, tags etc. Also, published raw data is integrated into other web services like Wikipedia which point back to libraries’ services. Indeed, Open Data is an end in itself which should be pursued by more organizations in the library world and beyond it.
The provided data is currently availble in a proprietary but open format. Can you give us some technical description of the published data? Do you have plans in providing more structured datasets in the future?“Opaque but open” would be the better description of the underlying format because it isn’t proprietary at all. Actually, alongside the data from the hbz union catalog there is data stemming from libraries’ local databases (see [opendata.ub.uni-koeln.de] and [opendata.zbsport.de] ). We are using different internal formats. Generally, all the formats are based on the MAB format (an acronym for “Maschinelles Austauschformat für Bibliotheken” which means “Automatic Interchange Format for Libraries”) that is only used in the German and Austrian library world for the data interchange between libraries similar to the better known MARC format (Machine-Readable Cataloging) of the Library of Congress. It was developed in the 1970s for storing data on magnetic tape. The format documentation can be viewed on the German National Library’s webpages. As the format is nearly 40 years old, the processing of MAB data is very cumbersome on modern computers. Therefore, the hbz provides an encapsulation method called “generic format”, where the historic data records of the library catalogs are unwrapped into a more common, user-friendly scheme. Each record is placed into a Unicode UTF-8 encoded file, containing all the MAB fields, each of them separated by line feeds, and the whole record set of a library is forming a “tar” archive, which is compressed afterwards to save space. It is possible to dump those archives by a usual unpack tool. This software is available on all known Windows/Linux/Unix platforms. Or you can use a simple Perl helper script provided by hbz. More tools and scripts, even in other programming languages, are in preparation for publication. The opaqueness and the age of the standards used in the library world (the english standard MARC which is used worldwide doesn’t differ in these respects from MAB) make it necessary to change to a more open and widely adopted standard. That’s where Linked Data comes into play which is based on the accepted and widespread standards HTTP and URIs. The construction of RDF out of the library catalog raw data is a very sophisticated design task. Our plans are to convert the existing data to RDF using proper vocabularies which enable us to lose as little information as possible and giving access to the data by providing a SPARQL endpoint.
Currently the data you provide is open but not yet linked. What are your plans when it comes to contribute to the Linked Data Cloud?I have to go into greater detail to answer this question properly. Viewed simply, the data of library institutions can be divided into two broad types: authority data and bibliographic data. Authority data splits up in data about people, about corporate entities and about subject headings. In Germany, authority data is maintained centrally by the German National Library in cooperation with the six German library consortia. Bibliographic databases consist of records about books or rather editions of books. Authority data and bibliographic data are already heavily linked, for instance a bibliographic record contains the author’s or editor’s authority number which links to the corresponding authority record. The German National Library is also working on migrating library data, especially authority data, into the Semantic Web. They recently made their Linked Data prototype for authority data publicly available. We have already taken first steps to cooperate and coordinate our efforts. The colleagues at the German National Library have recently developed a Linked Data prototype for their authority data. As they take care of authority data we focus ourselves on bibliographic data. At the moment we are exploring the technology and vocabularies for publishing bibliographic data as Linked Data. That’s a demanding task because besides the known vocabularies like Dublin Core or the Bibliographic Ontology (Bibo) which don’t fully map to the density and structure of the information in the catalogs, there has been several years’ work on the new comprehensive cataloging standard RDA (Resource Description and Access) for which a RDF representation has been developed. However, RDA in RDF needs to be modified a lot so that it can be applied to our bibliographic data. We are currently working on a vocabulary for the union catalog’s data based on existing vocabularies like Bibo and RDA. Of course, as soon as we will have published bibliographic data as linked data we will start linking to hubs in the Linked Data Cloud like DBpedia or GeoNames.
Publishing data to the LOD Cloud is one thing. Consuming data is another. Have you plans to integrate data from the LOD Cloud into your systems? Do you have policies for quality assurance?Of course the possibility to incorporate data from other sources easily is one major reason for us to publish Linked Data besides the goal of making libraries’ data an integral part of the web. Enriching our data with other data and providing new services through and with mashups would be a main reason to link to other data. We are, however, not working on such projects yet, because we first need to convert our legacy data to RDF.
What role will the Semantic Web play for libraries in the future?We believe the Semantic Web plays an important role for the future of libraries. Discussions about “Next Generation Catalogs” are a recurring theme in the library world since the 1990s. It is time to finally act and move our data enprisoned in opaque formats to a new level by improving its structure and underlying technology and by migrating to formats that can be easily consumed by others who are not part of the library world. Joining the Linked Open Data community seems to us the best way to go. Also, the production, publication and dissemination of academic literature is subject to ongoing and fundamental changes which have far-reaching implications for the work of academic libraries and their role in research and education. We believe that semantic markup and interlinking will play an important role in the development of knowledge production and thus indirectly will have great impact on libraries. Clearly, the Semantic Web can’t be cancelled out of the future of libraries.
Moreover, turning your question around, libraries could play an important role for the future of the Semantic Web. Libraries are trusted institutions and deeply grounded in our culture. As indicated above libraries have produced linked data (again: lower case) since the time of card catalogs. We undoubtly have some practice in producing and curating linked data which should be worth a lot to the Semantic Web community. We thus think libraries are predestinated for helping to coninuously order the messy place the Semantic Web always will be and ensuring its trustworthiness and stability.
About Adrian PohlAdrian Pohl is working at the Cologne-based North Rhine-Westphalian Library Service Center on Open Data, Linked Data and its conceptual, theoretical and legal implications. He regularly writes at Übertext: Blog about the internet, libraries and metadata, Linked Open Data, communication, epistemology and the like. He has studied communication science and philosophy in Aachen and is currently studying Library and Information Science at the Cologne University of Applied Science. You can follow him on Twitter: [twitter.com] .
-
17:10 Eric A. Franzon: “Semantic Technologies are becoming mainstream.”
» The Semantic PuzzleStarted in 2005 the Semantic Technology Conference has become one of the international community hot spots for the commercial application of and trend scouting in semantic technologies. Tassilo Pellegrini talked to the organizer Eric A. Franzon, VP of Wilshire Conferences and Semantic Universe, about what to expect from the upcoming event and how semantic technologies are becoming mainstream.
From June 21 – 25, 2010 the annual Semantic Technology conference will take place for the 6th time. Looking back: what has changed over time? What are the hot topics at this year’s conference?We launched SemTech in 2005 in San Francisco. It was a good turnout for a new event, with around 300 attendees. By 2009, that number had grown to 1100, so audience size has been a significant change, certainly. However, our interest all along was to grow an industry as well as an event, and I have absolutely seen that growth and maturation. Ours was the first conference devoted to the commercialization of Semantic Technologies, and at that first conference, there was a predominant academic presence. That’s not a bad thing – this, like so many technical industries, came out of academia. Nonetheless, it’s nice to see that by 2010, there is significant adoption by businesses and organizations. I actually feel comfortable saying that Semantic Technologies are becoming mainstream; certainly not ubiquitous, but widely adopted.
The hot topics at the 2010 conference include exciting news in areas we have covered extensively before such as Linked Data, Semantic Search, Healthcare, and Publishing. But we also are delving much more deeply into new domains that have received a lot of attention recently such as Open Government, Marketing & Advertising, and Social Networks. There are new standards benchmarks to discuss such as SPARQL 1.1 and the business rules work that is being done with RIF. Additionally, we are seeing a lot of traction in Semantics in the Enterprise, so SemTech will have quite a bit to offer in that area as well.
While semantic technologies have been around for quite some years now the advent of the Semantic Web added a new spin to the community. What do you expect for the future when it comes to the convergence of semantic technologies and the Semantic Web?I see Semantic Technologies as a superset of the space that is the Semantic Web. The Semantic Web is public; the area I call Semantic Technologies includes non-public, closed systems – behind firewalls. We’ve actually seen this before. At the same time that the World Wide Web really hit its stride in the mid-1990’s, we saw widespread adoption of portals and corporate intranets. Even though they did not sit on the public Web, these systems used the technologies of the Web to link documents, enabling organizations to share those documents globally, quickly, and inexpensively.
As the tools become better and we see more use cases in the Semantic Web, I see parallel development of semantically enabled enterprise systems. In the same way enterprises were using early Web technologies to share documents behind firewalls, they are now using semantic systems to share data globally, quickly, and inexpensively. At first – and we are seeing this already – in-house systems will consume data from the public Web, essentially mixing public and private data. This is relatively easy to do when both systems are built on a similar set of technologies, and there are an increasing number of rich data sets for companies to use. Think of a corporate system that consumes real-time stock data, for example. The system is not generating that information itself, but it might be using it in a corporate application.
One of the prominent topics at the moment is Linked Data which in connection with Semantic Web might evoke a paradigm shift in data integration issues. How do you experience this trend? How should companies react?If you think about the ‘traditional’ challenges that enterprises have faced in managing data and meta data — issues like integration, disparate data, unstructured data, governance, legacy systems, and data quality (to name a few) — Semantic Technologies offer solutions. They’re not always the best solution for every problem, and I don’t expect that RDBMS systems will go away, but there are companies using Semantic Technology today to make money and save money.
From your perspective: what are the most exciting things to look out for in the near future?There is a great opportunity for tool developers to enter the marketplace. The community is hungry for new tools and for semantic development to be integrated into the tools and development environments they are already using. Another area that I believe the industry is hungry for is good UI development. Data is powerful, but its usefulness is often only seen in solid visualizations and reporting. I expect that more of these tools will emerge in the very near future.
Tools for publishers like OpenCalais, Zemanta, and the rich semantics available in Drupal 7 are making it possible for less-technical people to include semantics in their web pages.
Another area to watch is consumer applications. Tripit, Siri, and Adaptive Blue’s Glue have shown that there is a market for data-driven applications for consumers.
About Eric A. FranzonOver the last decade, Eric Franzon has served as VP of Wilshire Conferences, where he has been exploring the world of enterprise data. As VP of Semantic Universe, he has worked to raise awareness and explain the usage of Semantic Technologies and Web 3.0 in business and consumer settings. A lifelong learner and teacher, Eric is frequently called on as a consultant, coach, and trainer around complex technical topics. He is an advisory committee representative with the World Wide Web Consortium and an Affiliate Analyst with Guidewire Group. Eric has also taught improvisational comedy, early childhood education, blues harmonica, and gender studies. A Chicago native, he now lives in Los Angeles.








































