Feeds
Tech Semantic Web and Linked Data
(590 unread)
-
OpenLink Community Blog (88 unread)
-
Z-Blog
(242 unread) -
Nodalities
(47 unread) -
EN - Flux RSS - R & D (11 unread)
-
The Semantic Puzzle
(68 unread) -
OpenCalais - Official Blog
(6 unread) -
Chief Marketing Technologist (124 unread)
-
Semai
(1 unread) -
Reactive, autonomous (3 unread)
Tech Coding the Web and Software
(156 unread)
-
Software Cooperative News (69 unread)
-
Talance Friendly Web Tools Blog (87 unread)
Tech General News
(14447 unread)
-
Tech Eye - Latest technology headlines (4200 unread)
-
BBC News - Technology (4590 unread)
-
NYT > Technology (5657 unread)
Knowledge Man and Eng
(310 unread)
-
ISKO UK (56 unread)
-
KOnnect
(10 unread) -
CELSTEC Publications
(216 unread) -
Knowledge Engineering (19 unread)
-
Open Intelligence
(9 unread)
Friends
(311 unread)
-
VISION AFORETHOUGHT
(82 unread) -
Snell-Pym
(229 unread)
Newspapers
(27293 unread)
-
The Guardian World News (10891 unread)
-
The Independent - Frontpage RSS Feed
(16402 unread)
Politics UK and Ireland
(1186 unread)
-
Liberal Democrats RSS (481 unread)
-
Green Liberal Democrats News Stories
(100 unread) -
Liberal Democrat Christian Forum
(5 unread) -
Liberal Youth - Latest News
-
The Alliance Party of Northern Ireland News Stories
(528 unread) -
Home
(72 unread)
Politics EU and International
(825 unread)
-
European Movement UK (38 unread)
-
European Movement Ireland
(89 unread) -
OSCE press releases and media advisories (376 unread)
-
ALDE News
(39 unread) -
ELDR News
(283 unread) -
IFLRY News and Updates
Religion Christian
(2234 unread)
-
Church of England News (259 unread)
-
Latest News
(725 unread) -
Open Path
(19 unread) -
Affirming Liberalism
(11 unread) -
Greenbelt Blog (464 unread)
-
Fresh Expressions RSS feed (442 unread)
-
Emergent Village
(56 unread) -
Taizé (258 unread)
Religion Interfaith and Universalism
(496 unread)
-
Interfaith (230 unread)
-
IDC Interfaith Dialog Center
(139 unread) -
Inter-Religious Dialogue
(127 unread)
Reactive, autonomous
-
12:11
Firefox4 on 64 bit Ubuntu
» Reactive, autonomousNow that Firefox 4 final is out, I wanted to switch to using it rather than version 3.x on my Ubuntu 10.10 systems. You can download a Linux version of Firefox 4.0 from the Mozilla site, but it's 32-bit only. While it will run, the problem I found is that all of my plugins (libflashplayer, etc) stopped working. The solution is to install the 64-bit build of Firefox 4, which is available on a launchpad PPA. Thus:
sudo apt-add-repository ppa:mozillateam/firefox-stable sudo apt-get update sudo apt-get upgradeThat gave me a new, working FireFox 4.0. Most of my extensions worked OK, with the exception of del.icio.us (fix here), and All-in-one sidebar (now uninstalled). And, weirdly, the language packs aren't compatible between 3.x and 4.0, but I assume that will get sorted in time.
-
14:38
Quick tip: SHA1 or MD5 checksum strings in Java
» Reactive, autonomousIn a recent project (OK, I was processing FOAF data), I needed to be able to generate the hex-encoded string of a SHA1 checksum. The built-in Java security classes can do the heavy-lifting, of generating the checksum itself, but they deliver the resulting checksum as a byte-array. I needed the string encoding of that array. There are various code-snippets around on the web to do that, by iterating over the byte array and incrementally building up the string in a buffer, but they look low-level and inelegant. The following is much neater (written for SHA1, but the method works for the other digest formats):
public String getEncodedSha1Sum( String key ) { try { MessageDigest md = MessageDigest.getInstance( "SHA1" ); md.update( key.getBytes() ); return new BigInteger( 1, md.digest() ).toString(16); } catch (NoSuchAlgorithmException e) { // handle error case to taste } }Kudos to Brian Gianforcaro on StackOverflow for the tip
-
23:30
Ruby, rvm and Eclipse
» Reactive, autonomousI've switched to using rvm for all of my ruby development. Rvm makes it trivial to have multiple copies of ruby installed side-by-side, while trying to do that using
apt-getwas proving a big headache. It has brought one minor gotcha though: rvm runs a script from the.bashrcor.bash_profileto set up the environment so that a current – default – version of ruby is always selected. If the.bashrcisn't run, no rvm and hence no ruby. I have my Linux desktop set up so that I have a quick-lauch button for Eclipse on the toolbar docked to the edge of my screen. Launching Eclipse from there doesn't invoke.bashrcfor the containing shell, hence Eclipse wasn't seeing my installed rubies. I could point to the actual location of the ruby interpreters, but my applications weren't able to load any gems.My solution was to force Eclipse to run in a shell which does load
.bashrc. This is simply achieved by the-iand-cflags for/bin/bash: i to force an interactive shell (which loads .bashrc) and c to run a specific command (Eclipse in this case).
-
17:33
Maven, Eclipse and Tomcat: alternative to WTP
» Reactive, autonomousDespite some frustrations, I like using Maven in my Java projects for its dependency management and standardized project layouts. Since I develop in Eclipse primarily, I use the m2eclipse plugin to make Eclipse maven-aware. Which generally works well, except in the case of developing web applications. Eclipse's primary tool for managing web application development is the web tools project or WTP. WTP takes a fairly heavyweight approach to web apps; in particular it likes to copy the contents of a project into a special location (typically
${workspace}/.metadata/.plugins/org.eclipse.wst.server.core/tmp0) and use that as the web application context. It has to re-copy, including maven dependencies, when code is updated. Partly for this reason, and partly because WTP and m2eclipse don't always play nice, I've been looking at alternatives.A colleague recommends the Sysdeo Tomcat plugin for Eclipse, which runs only Tomcat (fairly obviously), not the other app-containers that WTP supports, but does so in-place: there's no need for code to be copied to a temporary context. Although there's no Eclipse update URL, it's easy enough to install the plugin. The plugin provides a Tomcat extension DevLoader, which puts dependencies (including maven-managed dependencies) on the app's classpath without them needing to be in
webapp/lib. To enable the DevLoader functionality with Tomcat version 6, I had to follow the instructions on this blog post to make adevloader.jar, rather than follow the instructions on the plugin website (which I suspect worked for older versions of Tomcat). -
15:08
Design vs engineering: not an either-or
» Reactive, autonomousA serendipitous collision of two worlds through links passed via Twitter today. First, a nice rant from Andraz Tori from Zemanta: Do semantic web interfaces have to be ugly?. Tori bemoans semantic web applications that let their clever technological underpinnings and perfect abstractions hang out for all to admire, rather than focussing on helping the user to achieve their goals through superior user experience design. Well and good. Then I got linked to an article by Brian Zmijewski of Zurb: Fewer engineers please!. In this piece, Zmijewski argues that smaller project teams are better: if two pizzas can't feed the team then it's too big. In particular, the suggestion is that there should be fewer engineers messing up the user experience design (ok, I'm paraphasing a bit ... but only a bit). Separate from the main thesis (I'll come back to that), the most interesting part of the article is the furious reaction in the comments, mostly from engineers who claim that the only role of designers is to "pretty things up", presumably by picking exactly the right shade of pastel or something. Of course, if I can borrow an American phrase, that's a total crock. Designers do much more than that. But equally, engineers shouldn't be parodied as introverted inaesthetes either. Nobody wants to develop a poor product or let down customers and users. It's bad for business, but it's fundamentally unsatisfying to work on something that people don't like or won't buy.
What's most depressing to me is to hear from both sides "No, we solve problems - that's our training". And it's true, both engineering training and design training focusses on solving problems. Which is right? Well, it's not a zero-sum game, so both camps are correct – up to a point. What we need more of, in my view, is really thinking about the end-user. The design community have, traditionally, had a greater strength in this regard, but it's a teachable skill and it's a skill that should be shared and developed, not hoarded. Design thinking should not the sole purview of designers. Let's work together to build some non-ugly semantic web interfaces, and other great products.
Are small project teams better? It's axiomatic that adding more people that necessary just increases waste and delay, but arbitrary rules are silly. It depends what you're trying to achieve. Yes a small teams can do good work, but, for example, nearly two thousand two hundred people built the LHC. That would be a singularly big pizza.
In summary: fewer arbitrary rules and demarcations, more user-centred renaissance developers.
-
18:29
Skype and Ubuntu: lost notification sounds - solution
» Reactive, autonomousI use Skype for Linux very successfully on my 64bit Ubuntu 10.04 distribution – it's a great way to call in to the daily scrum at work, for example. Recently update-manager upgraded my Skype to version 2.1.0.81, at which point I stopped hearing notification sounds (e.g. the incoming-call notification, which is pretty handy!). I still got on-screen popup notifications where enabled, and I could still make calls and hear callers, but no notification sounds.
The solution was to go to the
gnome-volume-control, and massively increase the slider forAlert volumeslider. It had been at about 20%, it's now set at about 95%. However, I can once again hear skype notifications – at pretty much the same volume as before. So I don't know what's changed betweengnome-volume-controland Skype: there may have been an update to the volume control that I didn't notice go by, or maybe this version of Skype is more sensitive to that setting. Still, at least that's one more annoyance is off the list. -
11:36
A bit of gentle advice for student projects
» Reactive, autonomousBecause my email address is attached to some the Jena source code, I quite often get direct email from students seeking help with projects. In general I don't mind this, though I'm sufficiently busy with actual work that I can't offer much direct support. My usual feedback is that most people need to get much clearer what they are actually to do. Often they come with very vague ideas:
I have to complete my project on ontology. Can you please help me to create a Jena interface?
Which gives pretty much nothing to go on. As a UI designer, you need some idea of who you are designing for, and what they care about. Here's my standard advice, which I've given out sufficiently often that it's worth repeating it in public so that I can just refer to it in future!
My core advice would be to get much clearer what you mean by "Jena interface" – (or however you've termed your project) without more context, I don't understand what that means, and I suspect you don't really understand it either.
So my suggestion is: consider a person using your interface when it's completed. Ask yourself some basic questions about them: are they beginners or experts? Are they a programmer, end-user, ontology designer, or what? What are the most important tasks do they need to do in order to succeed in their role? For example, an ontology designer might want to load an ontology file that someone else has created and add their own annotations to it, to fit their customers data needs. Or, a programmer might want to search for an ontology that would help them represent concepts about wildlife for a media catalogue program, so that users of the catalogue can tag consistently. By making this description as detailed as possible, you get a much clearer understanding of your users' needs.
Then ask yourself what are the three most important features of your interface that would make those users' jobs easier. Imagine them sending a twitter message: "Hey, this interface is really cool, it allows me to do X really well!" And that gives you the starting point for your design process.
By the way it doesn't matter if you are only doing this as a student project, not as a real product. Either find a group of real users to work with, or make up some users and jobs for them to do. That way, you'll end up with a more interesting interface, and it will make it much easier for you to write your end-of-project report!
And by the way, specific questions about Jena should be asked on the Jena support email list.
-
13:08
Google gdata library and maven
» Reactive, autonomousDespite a number of requests from a variety of users, there are as-yet no official maven versions of Google's gdata client Java library. There are various scripts to push the
.jarfiles into a local repo, and one project on Google code containing an older version of the client library, mavenized. I think it's up to the code maintainers (Google) to put a process in place for pushing official maven versions of the gdata.jarsout to the public repos. In the meantime, the best we can hope for is to load into local repos. However, a problem with the scripts I've seen is that they bake-in the various versions of the libraries to the script. So here's my variant, which takes the artifact name and version from the.jarfilename:#!/bin/bash for f in java/lib/*.jar do if [[ $f =~ java/lib/gdata-([a-z-]*)-(.*).jar ]]; then n=${BASH_REMATCH[1]} v=${BASH_REMATCH[2]} echo "installing mvn artifact $n $v" mvn install:install-file -DgroupId=com.google.gdata -DartifactId=$n -Dversion=$v -Dfile=$f -Dpackaging=jar -DgeneratePom=true fi doneTested on my Linux system at home, but I'm expecting this to also run on Windows/cygwin when I get into the office tomorrow!
Update Thanks to ildella on Twitter for pointing out that there's a maintained, up-to-date collection of maven-ized gdata artifacts on googlecode.
-
16:01
British English Thesaurus for OpenOffice 3.x
» Reactive, autonomousOpenOffice 3.x on Ubuntu doesn't come with a thesaurus for British English (en-gb), which is irritating. Google shows lots of recipes for faking a British thesaurus using the installed thesaurus for US English, which is at best an approximation, but in any case they don't work on OO3. A much better solution is to download the British English Thesaurus OpenOffice extension from the weekly whinge and install it (from any OO application:
Tools » Extension manager... » Add ...). Kudos!del.icio.us: openoffice, thesaurus, solved.
-
8:37
Great article: learning from hostage negotiators
» Reactive, autonomousReally interesting article on Boxes and arrows: what design researchers can learn from hostage negotiators. I've used something akin to the coach role in user-studies that I have conducted in the past, though in our case the coach also had the role of note-taker, freeing the lead interviewer to be able to listen carefully to the interviewee without being distracted by getting the key points down on paper. Nonetheless, an essential part of the note-taker's role was to notice if the interviewer had missed some interesting avenue to follow, and prompt (but not take over the dialogue).
del.icio.us: user-studies, best-practice.
-
0:57
Moving along
» Reactive, autonomousFriday October 30th was my last working day at HPLabs after 20+ years with HP, most of that in the research labs. I'm not going to introspect too much on the event itself – the reasons for the large-scale changes in staffing are for HPL management not me to discuss. Suffice to say that Jena will continue, and indeed become more open, and the current Jena team members will continue to contribute to the platform, albeit from different host organizations. For me personally, alongside a number of ex-HP colleagues I'll be moving to a new Linked Open Data startup named Epimorphics. More details on what that involves in due course! However, after a long time in corporate R&D for a very large organization, I'm very much looking forward to working for a company that is smaller (but with big ambitions) and more agile.
In the meantime, it does mean that I can no longer be reached on my old email address:
ian.dickinson@hp.com. For people who used to use that address, please update your contacts list to point toi.j.dickinson@gmail.com. -
19:21
ISWC research track - raw notes 4
» Reactive, autonomousLifting events in RDF from interactions with annotated web pages - Stuhmer et al
Want to model complex events - multiple mouse clicks. Use case: online advertisement. Contextual advertisement, behavioural advertising. Context, eg Adsense, based on ip etc. Behavioural - based on history of user's web pages, using cookies or web-bugs.
Drawbacks: context - similarity matching not robust. Behavioural - old history may not be relevant. To remedy: build complex events as short-term profiles, model these in an OWL ontology. Schema seems to encode a basic ontology of event expressions: conjunction, sequence, etc. Simple events: DOM (incl. mouse clicks) and clock.
Add some context to simple events. More than just the DOM location (that would be just syntax). Annotate pages with RDFa, use those annotations to enrich simple events. If the event happens on one node which is an RDF subject, use that to constrain the choice of subjects. Otherwise, go up the DOM tree to the dominator node, which may be the document root.
Contributions: the technical implementatio, the event model itself.
Server side event processing - not done yet. [which makes it hard to see what the value is, since they don't illustrate the interpretation of the events]
-
15:58
ISWC - SPARQL WG panel
» Reactive, autonomousWG has just published a first set of six working drafts that indicate what's coming in SPARQL 1.1. Caveat: no decisions yet, just indications. Naming: SPARQL 1.1 query, update and service description. Picked about 10 out of about 50 proposed extensions. Stable by Spring'10, Completed in Aug'10.
Project expressions - select something in a query that is not just a simple variable. Aggregates - min, max, count, etc. Subqueries - embed one query in another. Negation - sparql 1.0 makes it difficult to ask what is not known, fix this in 1.1. Service description - language for describing common extensions in a given sparql end point. Update language - 1.0 is read-only, member submission of rough draft of update language, will be in 1.1. Update protocol - use of HTTP POST for update. Map basic RDF operations to core HTTP operations (RESTful RDF via sparql). Following are 'time permitting', will be done if there is time ... Property paths - arbitrary length paths through the graph; regex-alike expressions. Basic federated query - based on ARQ's SERVICE keyword. Entailment regimes - what does it mean to query SPARQL in the face of RDFS or OWL, or RIF rulesets. Common functions - commonly used built-ins.
Slides here.
Brief intro from WG members. Moved on to Q&A, but I had to leave to check out of the hotel.
-
14:14
ISWC Tom Mitchell keynote - raw notes
» Reactive, autonomousHow will we populate the semantic web on a vast scale? - Tom Mitchell keynote
Three answers: humans will enter structured info; database owners will publish; computers will read unstructured web data.
Read the Web project. Inputs: initial ontology, handful of training examples, the web (!), occasional access to a human trainer. Goals: (1) system running 24x7, each day extract more facts from the web to populate the initial ontology, (2) each day learn to perform #1 better than the day before.
Natural language understanding is hard. How to make it more plausible for machines to read? ways:
- leverage redundancy on the web (many facts are repeated often, in different forms)
- target reading to populate a given ontology, restrict focus of attention
- Use new semi-supervised learning algorithms
- Seed learning from Freebase, DbPedia, etc...
State of project today: ontology of 10^2 classes, 10-20 seed examples of each, 100 million web pages. Running on yahoo m45 cluster. Examples include both relations and categories.
All code is open-source, available on web site. Currently XML, working on RDF.
Impressive demo of determining academic fields: 20 input examples, looked like hundreds of learned examples, good quality results. Output includes the learned patterns and alternate interpretations considered. approx 20K entities, approx 40K extracted beliefs
Semi-supervised learning starts to diverge after a few iterations. Under-constrained. Making the task apparently more complex by learning many classes and relations simultaneously. Adds constraints. Unlabeled examples become constraints. Nested, coupled constraints. "Kryzewski coaches for the Devils" have to simulatenously classify coach name and team name.
"Luke is mayor of Pittsburgh" - learn functions for classifying Pittsburgh as a city based on (a) "Pittsburgh" and separately (but coupled) (b) "Luke is mayor of"
Information from the ontology provides constraints to couple classifiers together: e.g disjointness between concepts. Also provides for consistency of arguments in noun phrases (domain and range constraints).
Coupled bootstrap learner. Given ontology O and corpus C. Assign positive and negative examples to classifiers (e.g. cities are negative examples of teams). Extract candidate (conservative), filter, train instance and pattern classifiers, assess, promote high confidence candidats, share examples back to coupled classifiers using ontology (including using the subsumption hierarchy)
Rather than focussing on single sentences in single docs, system looks across many sentences to look for co-occurrence statistics. Macro-read many documents, rather than micro-read single document.
Example of IBM learned facts. Rejected candidates might be good input to a human doing manual ontology design.
If some coupling is good, how to get even more? One answer: look at html structure, not just plain text. If some cars are li elements in a list, then likely the other li's are cars as well. PhD student Richard Wang at CMU - SEAL system. Combine SEAL and CBL. Combined system generally gets good results, though performance is poor in some categories (e.g. sports equipment). To address performance issues, extend ontologies to include nearby but distinct categories.
System runs for about a week, before needing restart. Some categories saturate fairly quickly.
Want a system that learns other categories of knowledge. Tries to learn rules by mining the extracted KB. Need positive examples - get from KB. Where to get negative examples? Not stored in KB. Get help from ontology. For restricted cardinality properties (e.g. functional), can infer negative examples.
Examples of learned rules - conditional horn clauses with weights. Showed some of the failed rules as well, e.g. skewed resuls due to partial availability of data. Good rules can be very useful, but bad rules are very bad - need human inspection to filter out bad rules.
Future work: add modules to inner loop of extractor, e.g. use morphology of noun phrases. Making independent errors is good! Also: tap in to Freebase and DbPedia to provide many more examples during bootstrap.
Q: can system correct previous mistakes? A: current system marches forward without making any retractions. Internally, evidence of error builds up. Should be possible to correct previously made assertions.
Q: how to deal with ambiguity? e.g. Cleveland is a city and a baseball team. A: current system is a KB of noun phrases non entities in the world. Knows that things can't be multiple categories. Leads to inconsistency. Need to change program to have distinct terms for the word and its separate senses.
Q: what about probabilistic KB? A: currently store probs, but hard part is how to integrate 10^5 probabilistic assertions. How to do prob reasoning at scale? Not known.
Q: can you learn rules with rare exceptions? A: can have exceptions, but not different types. Could understand the counter-examples to an otherwise good rule. Could generate new knowledge (example of 'continuing education students').
Q: how to deal with dynamic changes to the world? A: yes, it's a problem. Second most common cause of errors. Would need to add some temporal reasoning to the KB.
Q: what can we do from semweb to encourage ML researchers to contribute? A: it will happen, if you [sw community] can build big databases. Very excited about DbPedia. Suggest pushing on natural language processing conferences. They are not aware of these [semweb] kinds of resources. Btw, there are other linguistic resources you can draw on as well as wordnet, e.g verbnet, propnet (?).
-
20:45
Open architectures for open government - raw notes
» Reactive, autonomousCory Casanave
Many different architectures and solutions from many vendors. Want to link them together in open architectures. Integrate data as part of the LOD cloud. Flexible, standards based, flexible, info managed at source, .... Not perfect, but what other technology choice is there?
Architectures as data. Pictorial form: powerpoint, visio, sometimes UML. Architectures should be seen as data we can manipulate and share. Where the data is an how to use it. Example: process for decommisioning a nuclear reactor [is that an architecture?]. Current architecture assets are trapped in stovepipes.
Goal: linked open architectures. Want business and technology view of data. Need different viewpoints for different stakeholders, but over the same data.
Motivations include: federated data, collaboration, cost reduction, planning & governance, agile, drive IT solutions.
How? Publish architecture models as LoD, in their current vocabularies. [how to publish a visio diagram?] Roadmap for enhancing value: adapt and develop well-defined "semantic hub" models, map new architectures to these hubs, define stakeholder viewpoints, tools and techniques, external comment & input. Working groups at OMG and W3C.
Want standards based. e.g XMI, UML, BPMN, SoaML. Can publish these today. Exmample: data model with addresses, contact details, etc. Can convert XMI form into RDF [though the example given didn't look like valid RDF to me], and publish automatically by checking into a special SVN repo.
Demo at [portal.modeldriven.org] . Resources: GAIN initiative (open government, open linked data and architecture [portal.modeldriven.org] .
-
19:25
ISWC research track - raw notes 3
» Reactive, autonomousOntoCase: Automatic ontology enrichment based on ontology design patterns - Blomqvist
Ontology design patterns: www.ontologydesignpatterns.org. This talk focus on content pattens: small ontologies with a speficic design rationale
Semantic web: want more lightweight ontologies from non-logicians, e.g. web developers. Start with domain specification (e.g. texts) and task specs (e.g. competency questions). Ontology learning is possible, but accuracy is low and relational information is especially hard. Problems with background information.
OntoCase aims to add some explicit background knowledge and enrichment, built on top of exsiting ontology learning tools. Adds ontology design patterns. Input: learned OWL ontology and set of patterns. Output: ontology enriched by patterns. By: matching, cloning and integrating patterns. Two modes: true enrichment and pruning mode. In pruning, only include the parts of the input that match a pattern.
Example: input concepts person, hero (subclass of hero), stage. Match to agent pattern.
Evaluation ... missed some details ... can add background knowledge, and increase in accuracy of added relationships.
Future work: general improvements, does not use task inputs at the moment.
-
19:00
ISWC research track - raw notes 2
» Reactive, autonomousGraph based ontology construction from heterogenous sources - Boehm et al
Gene ontology: 28K concepts, 42K relations, takes human experts years to make a new release. Would like automatic ontology bootstrapping. Four steps: concept definition, concept discovery, relationship extraction, ontology extraction
Contribution: combination of heterogenous information sources. Given a set of concepts and a large text corpus, create directed weighted concept graph, find a sub-graph that is consistent (cycle free), valid and balanced.
List of desirable topological properties, tree form, balance, etc.
Solution 1: greedy edge inclusion. copy nodes first, then copy edges one at a time discarding any that add a cycle.
solution 2: find set of nodes that are strongly likely to be a super-concept of other concepts. recursively add children, using a fan-out limit.
Evaluation. Text corpuse PhenomicDB. Compare to Mammalian Phenoeype. Weighted dominating set approach had highest precision.
[Author did not report on the human-acceptability of the auto-generated ontologies.]
Q: what's the basis of the desirability of the topological properties? A: introspection from own inspection. Comment from audience: tangled hierarchies can be shown to be better for browsing.
[other questions, rather hard to hear since there was no microphone]
-
18:30
ISWC research track - raw notes 1
» Reactive, autonomousDetecting high-level changes in RDF/S
Want to detect significant changes. Low-level language: report adds and removes of triples. Hight Level languages define classes of event: change_superclass, pull_up_class, etc. High level changes are closer to the intent of the change maker. High level changes are more concise.
Challenges: granularity (not too high or low), must be able to support a deterministic algorithm to assign triples to high-level changes.
Language defines triples added/removed, and semantic conditions on the triple set either before or after. For determinism, language must be complete and unambiguous in the consumption of changes.
Heuristic changes are those that require matchers, e.g. rename class.
[I have a nagging doubt that their algorithm won't work on a model that includes inference, not just raw triples. don't have a counterexample yet, though]
Algorithm is in theory quadratic, in practice the results are better than that.
Q: applies to OWL as well? A: no, only RDFS
Q: how do you know whether these are the right set of high-level applications? tested by introspection with human experts.
Q: related to refactoring languages in SW eng? A: probably, haven't looked at that.
-
16:21
ISWC In Use Track - raw notes 4
» Reactive, autonomousRapid: enabling scalable ad-hoc analytics on the semantic web - Sridhar et al
Motivation: rapid growth in RDF data. Progress on storage, but not analytics.
analytical queries include multiple groupings and aggregations. E.g for each month of the year, the average sales vs the sales in the preceding month. Hard to do in databases, even hard in RDF because: absence of schema, combine data and metadata.
goal: using map-reduce to do RDF analytics. High-level dataflow languages e.g. pig, latin, etc, but these languages expect structured not semi-structured
RAPID uses pig as a basis. Extend pig latin with RDF primitives. showed raw pig latin program - about 10 steps. Q: how to automate/abstract this, to avoid chance of user errors? [missed a bit here]
expression types: class expression, path expression. Three key functions: generate fact dataset, generate base dataset, multi-dimensional join. GFD re-assembles n-ary relationsships from triples. GBD - container tuples for each group for which aggregation is required. MDJ find match between base and fact tuples, and update base dataset.
Reasonable results compared to non-optimised MapReduce applications. Comment from the audience: very slow (five orders of magnitude) compared to traditional data-warehousing.
[Saw comments on IRC via Twitter that this is just like early 90's BI applications. The example wasn't well chosen from that pov, but I think this is quite interesting. Doing analytics on large scale datasets is going to be a huge problem in my opinion]
-
15:58
ISWC In Use Track - raw notes 3
» Reactive, autonomousTudor Groza et al - Bridging the gap between linked open data and the semantic desktop.
web - problem finding and linking relevant work. desktop - publication silo, problem finding and linking relevant files.
Linked open data on the web - linking. Semantic desktop - linking. Can we connect them?
Incremental enrichment process. Extract shallow metadata, expand using linked data, integrate into semantic desktop.
Extraction of metadata: shallow or deep. [Author going extremely quickly though his material, very hard to take notes ... have to read the paper]
Good results from small-scale user study.
-
15:39
ISWC In Use Track - raw notes 2
» Reactive, autonomousKalyanpur et al - extracting enterprise vocabularies
IBM and Gartner. Enterprises need semantic vocabularies. Can they be generated bottom-up from source documents? Tried using NLP tools and off-the-shelf named entity recognizers, but poor recall (50% of possible terms identified by domain expert).
Summary of solution: algorithm to discover domain-specific terms and types; techniques to improve quality and coverage of LOD; statistical domain-specific NER's using LOD.
discovering domain-specific terms. use part-of-speech tagger to identify all nouns as possible terms, then filter using tfidf , then infer types using LOD, then use types to further filter the terms. Result in 896 terms, estimated probable terms would be 3000 in full dataset.
Improving recall: improved type mappings between dbpedia and freebase using conditional probs. New mappings included in dbpedia downlaod since Aug'09. Improved LOD: add instance types. Get entity disambiguation for free using term URI's. Generate candidate patterns using super-types from ontology, let machine learning system score each candidate.
Final result: start with precision - recall of 80-23, raised it to 78-46 with all improvements. Conclusion: lots of benefits of using LOD as input for vocabulary extraction.
-
15:00
ISWC In Use Track - raw notes 1
» Reactive, autonomousAuer & Lehmann - Spatially Linked Geodata
Many real-world tasks use spatial data. Current LOD datasets only have large-scale geographic structures, not bakeries, recycling bins, etc. How to get geo data for small scale objects? OpenStreetMap.com - provides a crystallization point for spatial web data integration. stats on current size of database, growth rates 7-11% montly in various categories. collaborative process, data stored in RDB but available as periodic dumps or incremental update feeds. Can add arbitrary key-value pairs to any element, can be used to add semweb annotations.
Authors' project converts OSM models and properties to RDF/OWL. Result: 500 classes, 50 object properties, 15K data properties (which seems like a lot)
Use triplify to generate RDF from relational data. Dump at linkedgeodata.org/Datasets, sparql endpoint hosted by OpenLink. Other REST interfaces: points within a circular radius of a given point (cool!), points within a radius belonging to a class, points in a radius with a given property value.
Want to link to other LOD datasets, e.g DbPedia. Some owl:sameAs links in schema are obvious. Also use DL-learner to match categories. For instance data, three matching criteria: name, location, type. Some problems matching locations, since no consensus on where to place location markers for large entities like cities. For large countries, e.g. Russia, centroids can be 1000km apart between OSM and Wikipeida. needed some string matching metrics to get name matches, but set threshold fairly high. Generated 50K matches to DbPedia objects, mostly cities.
Demo - very nice. Facet browsing can be used to narrow selections. Much effort to index data for efficient facet lookup. Quadtile indexing - 2 bits per quad, recurse. 18 zoom levels, producing discrete hypercube.
Future work: link to other datasets. Refine LGD schema. Refine browser. Apply best practices from other Geo projects.
-
14:07
ISWC keynote: Pat Hayes - raw notes
» Reactive, autonomousTwo talks in one. Blogic (web log = blog, so web logic = blogic). RDF Redux - how we could easily revise RDF to make it more expressive, without changing the meaning of existing RDF.
Principles of blogic. Web portability: logic and entailments can be accessed elsewhere, should commute. RDF is portable, ISO common logic is portable, OWL-DL, classical FOL are not. OWL-2 is better, but not quite there.
Names. IRI's have structure and meaning, can be owned and controlled, etc. However, in logic names are opaque tokens. Big disconnect, but not sure how to address it. RDF semantic interpretations are mappings from a given vocabulary, but it would be better to state 'from all possible names'
Horatio principle: truly universal quantification not a good idea. OWL is mostly OK, but complement is problematic.
SameAs not the same as. We need a way to describe co-reference without equating the conceptualisations. E.g DbPedia and CYC have different conceptualisations for sodium, but are denoted owl:sameAs.
Death by layering. Layer cake diagram is a good computer architecture layer but a really bad approach for semantics. E.g term URI's from OWL have different meanings depending on whether the triples are seen as basic RDF or as OWL.
Part 2: RDF redux
There are many things wrong with RDF that should be done better. [List]. However, there is a more basic problem: blank nodes in RDF are broken. Basic issue is that it is not obvious how to describe a bNode mathematically. Approach was to use set theory, but this was wrong. Using a Platonic idea to describe syntax. Fix would be to view graphs as drawn on some surface, then bNodes are marks on that surface. RDF redefined to be a graph + a surface, doesn't operationally change any existing RDF. No graph can be on more than one surface. Fixes lots of problems: copy vs. merge, named graphs, etc. Provides a syntactic scope for RDF nodes.
Surfaces themselves can have meaning. E.g: positive surfaces assert contents are true, negative surfaces assert contents are false, neutral surface, deprecated surface.
Would have to allow surfaces to nest, would require changes to RDF syntax. Allowing this, RDF would get full first-order semantics a la CS Pierce. Thus RDFS would not be a layer on RDF, but an abbreviation for assertions that are already expressible in (revised) RDF.
Question on tractability. Aren't the layers there for tractability? Ans: no, can still use languages with defined characteristics. Anyway layers don't do that either. This proposal is about metatheory, not practice.
Question: does it support other hard extensions like fuzzy langs, temporality? Ans: Doesn't solve, but gives it a clear point to start.
Question: (TimBL) isn't this what N3 has with curly bracket contexts? Ans: maybe, but Pierce was first
Q: so why not just fix RDF? A: would love to, what's the process?
Q: this borrows from conceptual graphs, but they aren't widely used, why would this succeed? No, just suggesting a refinement of the foundations of RDF. Don't overemphasise Pierce.
Q: we want family of nearly-same-as relations. What does logic offer? A: good question, wish I knew the ans! Context is important - success of communication depends on choosing the right interpretation of names. Lynne Stein argues this is a much more fundamental problem.
-
15:17
Semantic Sensor Networks - raw notes 4
» Reactive, autonomousSemantic management of streaming data - Rodriguez et al
Extension to RDF with a triple store and query engine to bridge triple stores and streaming stores. RDF does not have a built-in concept of time or dynamic data. Virtual sensors project views out of streamed radar data.
Resources, not triples, are timestamped. Add an annotation to the resource URI containing the timestamp. Finding the latest value of a stream requires optional and filter-not-bound in regular SPARQL. In TA-RDF, this reduces to annotation "[LAST]". Implementation based on Tupelo over Sesame.
-
13:25
Semantic Sensor Networks - raw notes 3
» Reactive, autonomousGenerating Data Wrapping ontologies from sensor networks - Sequeda et al
goal to learn wrapper ontologies from sensor networks, analogous to data source wrappers. Straight road problem: cars on a toll road have sensors, tolls are set to control flow. Seems to depend on a network of derived queries that are given for this domain. Was able to observe relationships between entities, but reducing the relationships to a recongnisable simple form remains future work.
-
13:04
Semantic Sensor Networks - raw notes 2
» Reactive, autonomousA survey of the semantic specification of sensors. Compton et al, CSIRO.
W3C has an incubator group on sem sensors - SSN-XG. Develop a reference OWL model for describing sensors.
Table of 12 existing semantic sensor ontologies, some active, some not. At least one agent-centric. Two perspectives: data and sensor. All input ontologies agree that a central concept is Sensor, but with different descriptions. Many aspects to model, including structure, network, physical attributes, accuracy, energy use etc. Different ontologies differ in depth and expressive power.
Supporting technologies: DL reasoners, SPARQL, rules, task assignment. [I Wonder how much of this would be relevant to the W3C reference ontology?]
Rich range of input ontology concepts, but there remain some concepts that do not yet appear in any of the precursor ontologies.
-
12:47
Semantic Sensor Networks - raw notes 1
» Reactive, autonomous -
21:56
SWUI'09 workshop - raw notes 4
» Reactive, autonomousFinal discussion. What do we want to be able to show in +1 year? How to demonstrate utility of SWUI designs. Examples. Building a catalogue of patterns, interesting examples. Duane has a private collection of interesting examples from recent ISWC presentations and other publications. Should we try to build a collaborative version? Possibly create a standardised problem set as an evaluation/benchmark? How to show more leadership to the rest of the community? Perhaps through the semantic challenge? What are the successful examples we can publicise?
