Feeds
Tech Semantic Web and Linked Data
(590 unread)
-
OpenLink Community Blog (88 unread)
-
Z-Blog
(242 unread) -
Nodalities
(47 unread) -
EN - Flux RSS - R & D (11 unread)
-
The Semantic Puzzle
(68 unread) -
OpenCalais - Official Blog
(6 unread) -
Chief Marketing Technologist (124 unread)
-
Semai
(1 unread) -
Reactive, autonomous (3 unread)
Tech Coding the Web and Software
(156 unread)
-
Software Cooperative News (69 unread)
-
Talance Friendly Web Tools Blog (87 unread)
Tech General News
(14447 unread)
-
Tech Eye - Latest technology headlines (4200 unread)
-
BBC News - Technology (4590 unread)
-
NYT > Technology (5657 unread)
Knowledge Man and Eng
(310 unread)
-
ISKO UK (56 unread)
-
KOnnect
(10 unread) -
CELSTEC Publications
(216 unread) -
Knowledge Engineering (19 unread)
-
Open Intelligence
(9 unread)
Friends
(311 unread)
-
VISION AFORETHOUGHT
(82 unread) -
Snell-Pym
(229 unread)
Newspapers
(27293 unread)
-
The Guardian World News (10891 unread)
-
The Independent - Frontpage RSS Feed
(16402 unread)
Politics UK and Ireland
(1186 unread)
-
Liberal Democrats RSS (481 unread)
-
Green Liberal Democrats News Stories
(100 unread) -
Liberal Democrat Christian Forum
(5 unread) -
Liberal Youth - Latest News
-
The Alliance Party of Northern Ireland News Stories
(528 unread) -
Home
(72 unread)
Politics EU and International
(825 unread)
-
European Movement UK (38 unread)
-
European Movement Ireland
(89 unread) -
OSCE press releases and media advisories (376 unread)
-
ALDE News
(39 unread) -
ELDR News
(283 unread) -
IFLRY News and Updates
Religion Christian
(2234 unread)
-
Church of England News (259 unread)
-
Latest News
(725 unread) -
Open Path
(19 unread) -
Affirming Liberalism
(11 unread) -
Greenbelt Blog (464 unread)
-
Fresh Expressions RSS feed (442 unread)
-
Emergent Village
(56 unread) -
Taizé (258 unread)
Religion Interfaith and Universalism
(496 unread)
-
Interfaith (230 unread)
-
IDC Interfaith Dialog Center
(139 unread) -
Inter-Religious Dialogue
(127 unread)
OpenCalais - Official Blog
-
15:50 Looking for Thomson Reuters Clients (huh?)
» OpenCalais - Official BlogBecause usage and registration for OpenCalais is very open (all we really require is a name and an email address) – it’s basically impossible for us to know who’s actually using the service. We know there are many thousands of you out there sending us many millions of documents per day – but we’re a bit in the dark about who the vast majority of you actually are.
We do know that you have a great affinity for Gmail and that the intersection of the sets “AOL mail users” and “OpenCalais users” is near zero.
We’d like to open up some conversations with Thomson Reuters clients about how we might leverage OpenCalais, our in-house Calais capabilities and the products you use from Thomson Reuters to make some interesting things happen. As a first step, we’d just like to talk with you about what you’re doing and see if we can help.
So – we’d appreciate it if you could raise your hand from the crowd and send us a note at team@opencalais.com and let us know who you are. We promise not to bug you – just a quick email and an invitation for a quick call if you’re interested.
Tom
-
17:29 Is OpenCalais Dead? - A Posting from Tom Tague
» OpenCalais - Official BlogIs OpenCalais Dead?That was the title of a recent inquiry in our forums and a blog post by a potential user. It’s one of those catchy questions that can develop a life of it’s own - so let’s address it.
Here’s the executive summary: No. It isn’t dead. It’s alive and well and isn’t going away.
Now, a few more details. The OpenCalais team has always been open and transparent about what we’re up to - and we’ll continue that in this note.
Big technology projects seem to go through a standard life cycle. At the start there’s extraordinary energy and rapid evolution in response to user needs and generally cool ideas. That’s generally followed by a period where the capability evolves for the sake of evolution and innovation - rather than in response to what users need. Once you’ve evolved past the point your users care about, you have a moment of truth where you decide 1) our users are clueless, 2) maybe we’re doing something wrong.
We chose option 2.
For a couple of years OpenCalais evolved at an incredible rate. We had releases on a monthly basis and big new functionality every quarter. We spent about 80% of our time on the road evangelizing the capability and building our base of users. It was really fun. It was really successful. We process millions of transactions per day, power over 10,000 web sites and provide the technology that powers literally dozens of startups.
We followed that with a couple of major upgrades - in particular the incorporation of a whole Linked Data ecosystem underneath OpenCalais for companies, geographies, products and a few other things. We were really excited about this new capability and we invested a lot of money in making it happen. We released it to moderate fanfare and were ready for a wealth of innovation and new OpenCalais-powered capabilities.
It didn’t happen. We saw very little adoption and no fundamentally new capabilities being built. That caused up to step back and think about what we were doing. The conclusion was pretty clear.
We’d gotten ahead of what our users cared about and their ability to adopt to a constant flow of changes. It’s a classic trap for techno-heads. We were brimming with cool new ideas that we wanted to get built and we got ahead of what our users care about. So - we decided to slow things down and let use cases and the market for semantic extraction mature a bit. That’s what we’ve been doing. It’s happening more slowly than any of us expected back in 2009 - but it is happening.
We’re constantly tweaking OpenCalais for improved accuracy and performance. These are under-the-cover activities that won’t be noticeable for the majority of users. We continue to enroll hundreds of new users each week - some of whom are building really cool stuff. And, in the last couple of years, we’ve signed some really significant commercial agreements ranging from 100’s of thousands to millions of transactions per day.
So - stay tuned. OpenCalais isn’t going anywhere but we are waiting to hear a clear message from the marketplace on where it should go. I’ve personally been spending the majority of my time within Thomson Reuters as the head of product for the Reuters News Agency and Reuters.com. It’s becoming pretty clear to me that there are massive opportunities in the areas of news, its integration with social media and its utilization as a massive repository of knowledge. More to come on this in the future.
We’re here to stay. Any questions? Feel free to drop us a note at [team@opencalais.com] .
Tom
-
17:29 Is OpenCalais Dead? - A Posting from Tom Tague
» OpenCalais - Official BlogIs OpenCalais Dead?That was the title of a recent inquiry in our forums and a blog post by a potential user. It’s one of those catchy questions that can develop a life of it’s own - so let’s address it.
Here’s the executive summary: No. It isn’t dead. It’s alive and well and isn’t going away.
Now, a few more details. The OpenCalais team has always been open and transparent about what we’re up to - and we’ll continue that in this note.
Big technology projects seem to go through a standard life cycle. At the start there’s extraordinary energy and rapid evolution in response to user needs and generally cool ideas. That’s generally followed by a period where the capability evolves for the sake of evolution and innovation - rather than in response to what users need. Once you’ve evolved past the point your users care about, you have a moment of truth where you decide 1) our users are clueless, 2) maybe we’re doing something wrong.
We chose option 2.
For a couple of years OpenCalais evolved at an incredible rate. We had releases on a monthly basis and big new functionality every quarter. We spent about 80% of our time on the road evangelizing the capability and building our base of users. It was really fun. It was really successful. We process millions of transactions per day, power over 10,000 web sites and provide the technology that powers literally dozens of startups.
We followed that with a couple of major upgrades - in particular the incorporation of a whole Linked Data ecosystem underneath OpenCalais for companies, geographies, products and a few other things. We were really excited about this new capability and we invested a lot of money in making it happen. We released it to moderate fanfare and were ready for a wealth of innovation and new OpenCalais-powered capabilities.
It didn’t happen. We saw very little adoption and no fundamentally new capabilities being built. That caused up to step back and think about what we were doing. The conclusion was pretty clear.
We’d gotten ahead of what our users cared about and their ability to adopt to a constant flow of changes. It’s a classic trap for techno-heads. We were brimming with cool new ideas that we wanted to get built and we got ahead of what our users care about. So - we decided to slow things down and let use cases and the market for semantic extraction mature a bit. That’s what we’ve been doing. It’s happening more slowly than any of us expected back in 2009 - but it is happening.
We’re constantly tweaking OpenCalais for improved accuracy and performance. These are under-the-cover activities that won’t be noticeable for the majority of users. We continue to enroll hundreds of new users each week - some of whom are building really cool stuff. And, in the last couple of years, we’ve signed some really significant commercial agreements ranging from 100’s of thousands to millions of transactions per day.
So - stay tuned. OpenCalais isn’t going anywhere but we are waiting to hear a clear message from the marketplace on where it should go. I’ve personally been spending the majority of my time within Thomson Reuters as the head of product for the Reuters News Agency and Reuters.com. It’s becoming pretty clear to me that there are massive opportunities in the areas of news, its integration with social media and its utilization as a massive repository of knowledge. More to come on this in the future.
We’re here to stay. Any questions? Feel free to drop us a note at [team@opencalais.com] .
Tom
-
6:31 Testing YourVersion
» OpenCalais - Official BlogYVWidget.renderWidget({ numStories: 4, width: 250, term: "opencalais", bgColor: "#000000", titleColor: "#FFFFFF", linkColor: "#3977D3", containerId: ""}); -
17:44 OpenCalais in the News
» OpenCalais - Official BlogOpenCalais has been in the news a fair amount of late, and I wanted to round up the clips for easy review here.
We announced 10 innovative new partners, including iPad apps, video news services, a community for the hedge fund industry, and disaster response / real time communications platforms. The news netted coverage from SocialMedia.biz and SemanticWeb.com. The initiative overall also received coverage from Technorati, ReadWriteWeb and Econtent Magazine (a profile of Sumit Shah, technical project lead for OpenCalais/Clearforest), Webwereld and Global Security Magazine. In other big news, Drupal creator Dries Buytaert’s company Acquia announced support and hosting services for OpenPublish, the Content Management System powered by OpenCalais and funded by Thomson Reuters. The news was covered by CMS Wire and InternetNews.com. Building on the Acquia news, we also announced in our blog that the US's oldest weekly news magazine, The Nation, had adopted the Web's most innovative CMS -- OpenPublish! Finally, OpenCalais was citied in ReadWriteWeb’s analysis of Facebook’s F8 ‘Open Graph Protocol.’ As always, we invite you to join the conversation on Twitter. -
19:55 The Nation Magazine Taps OpenPublish To Reach New Readers Online
» OpenCalais - Official BlogBIG NEWS!The Nation Magazine, the US's oldest weekly magazine, has relaunched on the state-of-the-art OpenPublish platform from Phase2 Technologies, a Drupal distribution with OpenCalais baked-in.
The Nation Magazine’s
redesign is the culmination of an 18-month effort to position the country’s oldest weekly magazine as a thought leader online.The redesign features strategic product innovations like search-engine friendly “topic pages,” story-level Twitter feeds and instantly customizable homepage and section designs.
This allows The Nation’s editors to respond more quickly to breaking news, and build cross-platform packages around major investigative reporting.
This same technology provides The Nation’s business staff with the flexibility to quickly configure customizable, innovative campaigns for advertisers and marketers.
OpenPublish is a Drupal publishing distribution from Phase2 Technology and Thomson Reuters that features OpenCalais’ semantic functionality throughout.Visit the all-new The Nation Magazine online here.
-
21:53 Introduction to OpenCalais
» OpenCalais - Official BlogIntroduction to OpenCalaisThe free OpenCalais service and open API is the fastest way to tag the people, places, facts and events in your content. It can help you improve your SEO, increase your reader engagement, create search-engine-friendly ‘topic hubs’ and streamline content operations – saving you time and money.
OpenCalais is free to use in both commercial and non-commercial settings, but can only be used on public content (don’t run your confidential or competitive company information through it!). OpenCalais does not keep a copy of your content, but it does keep a copy of the metadata it extracts there from.
To repeat, OpenCalais is not a private service, and there is no secure, enterprise version that you can buy to operate behind a firewall. It is your responsibility to police the content that you submit, so make sure you are comfortable with our Terms of Service (TOS) before you jump in.
You can process up to 50,000 documents per day (blog posts, news stories, Web pages, etc.) free of charge. If you need to process more than that – say you are an aggregator or a media monitoring service – then see this page to learn about Calais Professional. We offer a very affordable license.
OpenCalais’ early adopters include CBS Interactive / CNET, Huffington Post, Slate, Al Jazeera, The New Republic, The White House and more. Already more than 30,000 developers have signed up, and more than 50 publishers and 75 entrepreneurs are using the free service to help build their businesses.
You can read about the pioneering work of these publishers, entrepreneurs and developers here.
To get started, scroll to the bottom section of this page. To build OpenCalais into an existing site or publishing platform (CMS), you will need to work with your developers.
Why OpenCalais MattersThe reason OpenCalais – and so-called “Web 3.0” in general (concepts like the Semantic Web, Linked Data, etc.) – are important is that these technologies make it easy to automatically connect the people, companies and concepts in your content to the related content on the rest of the Web.
So when you’re writing about Twitter's new Promoted Tweets offering, you can be automatically connected to the other stories about Twitter's Promoted Tweets without having to embed links along the way.
Creating standardized metadata is about revealing the connections between people, companies, concepts and events and forging connections to relevant and related content automatically – streamlining your editorial processes and saving you time and expense along the way.
Ultimately, this new set of technologies is driving the next wave of innovation in digital media, and has the potential to inspire yet another “boom” similar to what we saw with SEO and SEM.
As innovators like MediaCloud, ViewChange.org and Hedgehogs.net (three more OpenCalais early adopters) lead the way, we will see more and more publishers, entrepreneurs and developers learning how to work with the new tools.
Why OpenCalais is FreeOpenCalais is a strategic initiative from Thomson Reuters to support the interoperability of content across the digital landscape.
Our goal with this initiative is not to make money, but rather to make it easy for folks to categorize and tag their content in a uniform and consistent fashion that complies with Semantic Web standards.
Offering a de-facto standard for making content interoperable in this fashion ultimately benefits Thomson Reuters, as it enables us to track themes, memes and trends on the Web, and to potentially do things like link out to relevant content that helps provide context to our readers, customers and other constituents.
The value exchange to us is in the metadata, in the growing body of interoperable content, and in the ability to support innovation, experimentation and the continued evolution of the Web.
We are fully committed to OpenCalais, and we offer the API for both commercial and non-commercial purposes precisely to inspire creativity and enterprise by a new wave of innovators and entrepreneurs.
There is no plan to someday "drop the other shoe" and charge folks for the basic service.
How That Helps YouUltimately, OpenCalais provides start-up entrepreneurs, publishers and institutions like libraries, museums and universities with the ability to forge a path into the future of digital media.
In addition to being a source for inspiration and innovation, the free service is a time- and cost-saving tool. It’s the fastest way to tag the people, places, facts and events in your content – and the easiest way to get the metadata you need in order to get truly creative with your user experience and user interface.
As our partners will attest, OpenCalais can help you:
- Improve your SEO as well as search and navigation within your own site
- Enhance your content with free open data assets from the Linked Data cloud
- Increase your reader engagement with greater personalization, ‘related reading’ sidebars, ‘more by this author’ widgets, search-engine-friendly ‘topic hubs’ and more.
- Streamline your editorial processes and content operations to save both time and money.
We are also thrilled that our price point has made OpenCalais a favorite with open source platforms like the Drupal Community, public research initiatives like DocumentCloud, and Ushahidi / Swift River – a leading real-time platform for crisis communications used in disaster recovery efforts around the world.
How to Get Started
There are a number of easy ways to get started.
If you are on WordPress, try the easy-to-install Calais Tagaroo plug-in, which automatically tags your content as you type. It can also fetch images from Flickr and videos from Google Video, which you can select for inclusion in your post, or disregard.*Note: You must have a hosted site – or your own server – where you have access to install WordPress plugins in order to use Tagaroo. Blogs hosted on WordPress.com won’t work with Tagaroo.
If you want to manipulate your search results appearance in Google and Yah
oo!, you can try Calais Marmoset, which is simple javacode you embed in your site pages. Marmoset will collect the metadata from your page (in the form of RDFa) and hand it over to Google Rich Snippets> and Yahoo! Search Monkey so that you can customize the way your search results appear.
If you want to extract metadata from Web pages using URLs, then try our Semantic Proxy service at SemanticProxy.com.
If you are using the popular open source platform Drupal, you can find a complete Calais Collection of modules for easy integration.

If you are building a new site from the ground up, consider usingOpenPublish, a free Content Management System based on Drupal.
OpenPublish bakes-in OpenCalais from the ground up to “semantify” your site and automate the creation of ‘related reading’ widgets, ‘topic hubs’ and more. OpenPublish comes from Phase2 Technology and Thomson Reuters, and now comes with the option of fee-based support and hosting from Drupal founder Dries Buytaert’s company, Acquia.
To build OpenCalais into an existing site or publishing platform (CMS), you will need to work with your developers. Developers can find the resources and information they need on the OpenCalais documentation pages, and – for general education on the Semantic Web - at Semantic Universe.Thanks for reading, and please join the open conversation with us on Twitter.
- Follow @OpenCalais for general updates.
- Follow @TomTague or @FinkelM for technical questions.
-
20:05 Developers: Win up to $25K in cash and prizes in The StreetApps Challenge
» OpenCalais - Official BlogAs you may know, OpenCalais is a strategic initiative from Thomson Reuters, which is an industry leader in providing timely, accurate information to financial professionals.
With those professionals increasingly turning to mobile devices, Thomson Reuters is challenging developers to build the most useful, high-quality, and visually appealing mobil apps, and offering $25,000 in cash, as well as premium mobile device prizes and a feature on the New York Times Square jumbotron.
The StreetApps Challenge team is opening up access to Thomson Reuters premium content for the challenge through Thomson Reuters Knowledge Direct, an application programming interface (API). Contestants are also encouraged to include relevant external content that will differentiate their application.
This competition's intention is to include developers in the creative process and challenge the public to think differently about how financial professionals use their mobile devices while on-the-go.
Importantly, as a part of Thomson Reuters commitment to the mobile space, developers maintain all intellectual property ownership for their submissions.
For more details, go to: http://www.streetappschallenge.com
-
15:15 Why OpenCalais?
» OpenCalais - Official BlogWhy OpenCalais?Over the last few months you’ve probably seen a number of announcements about how OpenCalais has been chosen by one organization or another to support its business.
In a number of recent meetings I’ve been asked the (very fair) question, Why OpenCalais and not one of the other entity extraction services out there?
Given that the question seems to be coming up more often as the number of extraction services increases, I thought I’d get my best understanding of why many major players we’ve announced (and an equal number we haven’t) have chosen to go with OpenCalais. And – at the end – I’ll mention a few reasons why others haven’t chosen OpenCalais.
So, in no particular order, why do organizations choose Calais?
Thomson Reuters
OpenCalais is provided by Thomson Reuters – the largest professional information organization in the world.
If you’re interested in kicking around some semantic technologies in your spare time this doesn’t really matter. If you’re incorporating those technologies deep within your business – or, as is true with many users – actually building a new business on top of them, this becomes pretty important. Basically – you need to know that the service is going to be there for you.
Facts & Events
With the increase in structured content assets like Wikipedia / DBpedia, it’s become pretty easy to knock out a basic entity extraction tool. And – while we like entity extraction as much as anyone else – it’s really just the tiniest starting point in what you can and will need to do.
OpenCalais extracts a wide range of facts and events from unstructured content and lets you know what’s happening in your content – not just tags for things.
- Facts are things like “John Doe is CEO of XYZ Corporation.”
- Events are things like “XYZ Corporation today announced that it would acquire ACME Corporation.”
OpenCalais is the only service that does this in a production-strength manner.
Reliability
OpenCalais stays up. It’s hosted in mirrored data centers thousands of miles apart from each another. It’s monitored 7*24. It basically doesn’t go down – even during system upgrades and maintenance. We stopped adding 9s after we got beyond 99.99% uptime.
Accuracy
We’ve been building the tools underneath OpenCalais for over a decade. They’ve been used by hundreds of organizations and many many thousands of end users. One of the things we’ve learned is that accuracy matters. While no NLP system is perfect, we’re convinced ours is the best and we have some ideas in the pipeline to increase accuracy even more.
Integration
We basically focus on providing great semantic plumbing. But we know that not everyone wants to be a plumber. We’ve worked to integrate (or motivate others to integrate) OpenCalais with a wide range of tools including Drupal, WordPress, WordPress Multiuser, Oracle, Lucene, Coldfusion, Flash, Firefox, Prolog, Lisp, Django, Java, PHP, Python, Alfresco, Perl, .NET, Ruby, TopBraid and a few others.
From content management systems to language-specific libraries – there are lots of ways to get started quickly.
Linked Data
We’re serious about Linked Data. We’re also worried about the proliferation of incorrect links and the effects of link rot. So, rather than just pointing to Linked Data assets out on the cloud and risking that they’ll go stale, we host our own Linked Data cloud, which is kept up to date with both Thomson Reuters contributed content as well as regularly validated links to other sources such as DBpedia, Freebase and others.
SocialTags
Pure semantic extraction is great – but sometimes you need more. If you’re writing about Porsches and Ferraris you’d probably like to have categorization concepts like “sports cars” and “automobiles” returned to you with your semantic metadata. OpenCalais does this via our ever-improving SocialTags concept tagging capability. It’s good now, and it’s going to get a lot better soon.
Focus
OpenCalais is here to provide great semantic plumbing. We’re not trying to sell ads. We’re not trying to provide the prettiest decorations for blogs. We build the plumbing – you architect the solutions.
Now, in a spirit of transparency, here’s why some people don’t choose OpenCalais:
Languages
We’re great in English and okay in French and Spanish (we extract entities but neither facts nor events in these two languages). We intend to implement more languages in the future – but for the time being we’re concentrating our efforts on improved functionality and accuracy in English.
Complexity
OpenCalais isn’t a simple tagging tool. What it returns to the calling application is a reasonably complex RDF construct. It takes a little time to get up to speed on RDF and how to use it in your applications. We think it’s worth it because it’s the most flexible and powerful format we know of.
Performance in Knowledge Domain ‘x’
Where ‘x’ is fashion or square dancing or rugby. OpenCalais is optimized for performance in the general world of business – that’s where we excel.
We have extended OpenCalais to take steps in other areas (such as sports, media, etc.) – but if you need deep semantic extraction capabilities related to protein binding – there are better places to look.
-
17:50 And we're back! OpenCalais 4.3 is running on all servers
» OpenCalais - Official BlogWe are happy to report that we have resolved the bug that we identified in the initial 4.3 release, and that the new and improved OpenCalais 4.3 is up and running on all servers.
As a quick reminder, here are the new features and expanded capabilitites of the OpenCalais service. As always, please let us know if you run into any issues or have any questions.
New in OpenCalais 4.3
Improved ‘Social Tags’: We are expanding on our popular Social Tags categorization technique by adding more generalized, aggregate tags.
For example, if a blogger is comparing the racing performance of sports cars like the Ferrari 308 GTB and Porsche 959, OpenCalais 4.3 will suggest auto racing and motorsport as Social Tags, in addition to the more obvious sports cars.
NEW! ‘News Names’: We are instituting a process of name normalization that represents a first step toward our more robust vision for person disambiguation. Whenever a partial or extended name appears in content, OpenCalais 4.3 will return the names it finds as usual, but will now also suggest the most commonly used form of that same name.
For example, for articles containing Barack Obama, Obama or Barack Hussein Obama, OpenCalais will suggest not only the partial or extended name it found, but also the more frequently used Barack Obama.
New Entities, Facts and Events in English, including:
- New Natural and Manmade Disaster attributes that reveal these disasters’ effects
- Supporting data for upcoming events that will enable OpenCalais to recognize new Movies, Music Albums, etc., as well as anticipated Medical Treatments
- More Political Events and new items such as Diplomatic Relations, Political Endorsements, Poll Results and Voting Results
- Enhanced Person Career extraction that includes political party affiliations where those are included in the text.
The 4.3 release also features improved Simple Format and Microformat outputs, as well as several extraction bug fixes. For technical details, please see the full release notes here. -
1:57 Well, Ouch, One Step Backwards - Reverting to 4.2 for a day or two
» OpenCalais - Official BlogWe released OpenCalais 4.3 a day or so ago - and we’ve run into a few issues as we’ve rolled it out into production. We think we have a handle on the fixes needed – but to play it safe we’re going to roll back to Release 4.2 for at least the weekend.
Sorry for any inconvenience – but we’d rather play it safe and take a day or two to get things totally in shape.
Tom -
16:23 OpenCalais 4.3 debuts with key updates to Tagaroo, SemanticProxy & Gnosis
» OpenCalais - Official BlogEnclosure: [download]
Big News!To kick-start 2010, we have updated virtually every tool we have in the OpenCalais arsenal. Check them out and let us know your thoughts.
- Version 4.3 of the OpenCalais service features enhanced Social Tags for categorization; improved disambiguation; new entities, facts and events in English; and an entirely new feature called News Names.
- By popular demand, the improved Tagaroo plugin for WordPress gives users more control over the tagging process.
- Our URL submission tool, SemanticProxy, has been normalized to ensure that its results are identical with those you would get by programmatically submitting content to OpenCalais via the API. Additionally, we are now supporting JSONP. You can now specify a callback function when requesting results in JSON.
- Our Gnosis plugin for Firefox now includes categorization of processed Web pages in the sidebar and can sort entities by alphabetical order or according to frequency or relevance.
- And finally, we have published the OpenCalais Schema in OWL, enabling OWL enthusiasts to apply ontology based tools on OpenCalais metadata.
Below please find further details on OpenCalais 4.3, Tagaroo and SemanticProxy.
New in OpenCalais 4.3
Improved ‘Social Tags’: We are expanding on our popular Social Tags categorization technique by adding more generalized, aggregate tags.
For example, if a blogger is comparing the racing performance of sports cars like the Ferrari 308 GTB and Porsche 959, OpenCalais 4.3 will suggest auto racing and motorsport as Social Tags, in addition to the more obvious sports cars.
NEW! ‘News Names’: We are instituting a process of name normalization that represents a first step toward our more robust vision for person disambiguation. Whenever a partial or extended name appears in content, OpenCalais 4.3 will return the names it finds as usual, but will now also suggest the most commonly used form of that same name.
For example, for articles containing Barack Obama, Obama or Barack Hussein Obama, OpenCalais will suggest not only the partial or extended name it found, but also the more frequently used Barack Obama.
New Entities, Facts and Events in English, including:
- New Natural and Manmade Disaster attributes that reveal these disasters’ effects
- Supporting data for upcoming events that will enable OpenCalais to recognize new Movies, Music Albums, etc., as well as anticipated Medical Treatments
- More Political Events and new items such as Diplomatic Relations, Political Endorsements, Poll Results and Voting Results
- Enhanced Person Career extraction that includes political party affiliations where those are included in the text.
The 4.3 release also features improved Simple Format and Microformat outputs, as well as several extraction bug fixes. For technical details, please see the full release notes here.
Tip Top Tagaroo!
By popular demand, we have also made two key improvements to our Tagaroo plugin for WordPress.*
Tagaroo no longer suggests tags while you type (which many users had found disruptive). Instead, you simply click a button when you are ready to see tag suggestions, and then select the tags you want.
We also added the ability for you to select / highlight a portion of text in your post and get tag suggestions for that text alone, in the event that you don’t want to tag the whole post.
*Note: You must have a hosted site – or your own server – where you have access to install WordPress plugins. Blogs hosted on WordPress.com won’t work with Tagaroo.
SemanticProxy Support for JSONP
Finally, we have also made two important changes to our URL submission tool, SemanticProxy.com.
First, we normalized the service to ensure that the results it returns are identical to those you would get by programmatically submitting content to OpenCalais via the API. This entailed discontinuing the HMTL cleaning that SemanticProxy did on its own, which had occasionally caused the service to return false content-type errors.
Second, we added the ability for you to specify a callback function when requesting results in JSON format. Callbacks are useful for web service requests in client-side JavaScript and provide a relatively simple way to invoke web service requests across domains.
To use it, specify the callback function name followed by the JSONP output format. For example: http://service.semanticproxy.com/processurl/licenseid/jsonp:processresul...
The JSONP output format wraps resulting JSON output text in parentheses and the provided callback function name. For the request above, wrapped results would be: processresults({"doc":{"info":...json output...”} );
As JSON is native to JavaScript, you can access the elements inside the returned output, similar to passing a JSON object reference to the processresults function. In other words, if your JavaScript has a processresults function which takes a JSON object as input and manipulates it, this function will be invoked automatically.
Please see the attached demo html to see this in action.*
*Note that the demo will not work in IE. Also, we have restricted the volume for this demo key. If it does not resolve, please plugin your own OpenCalais API key.
Jump in!Let us know what you think about these updates. We appreciate your feedback and look forward to any suggestions you may have.
-
20:13 Reflections on 'Puzzlepieces – Comparing NLP APIs for Entity Extraction'
» OpenCalais - Official BlogMichael Fagan undertook a review of various entity and term extraction tools this past weekend.
While the use case (content types, goals, etc.) of the test are not clear, we are heartened to see folks beginning to look at the growing variety of tools and services in the space.
You can read the full posting - as well as a commentary from our Tom Tague - here:  [faganm.com]
Tom's commentary is also excerpted below:
Michael:
Tom Tague from OpenCalais here.
Wow - you covered a lot of territory here and I could probably spend at least as much space responding to some of the points you've raised - but I’ll try to stay focused on just a few key items.
First of course is the use case - which you don’t reveal. And of course it’s difficult to evaluate tools without the intended use well understood. Are you “tagging” news? Analyzing a large corpus of documents for network effects?
For example - if your use case is simply for entity extraction then the volume of entities is rarely the goal - but rather a mixture of recall and precision. You can have perfect recall and low precision - or the reverse. The goal is to find the appropriate balance. If your use case requires named (e.g. typed) entities, then tools such as Yahoo should not be in the mix - they are term and not entity extraction engines.
Facts & Events: I was also a little surprised that you stopped at entity/term extraction. Most real-world use cases want to understand what’s happening in the text and are heavily dependent on facts and events - an area in which OpenCalais shines. Facts and events reveal the relationships between entities, and make up the core elements of “aboutness,” which are key values / benefits that many use cases for semantic technology seek to derive.

Semantic Links: It appears you missed our connection to the Linked Data Cloud on your “Semantic Links” section. For a growing number of entities that we return, we also return links to a rapidly growing set of Calais-provided, Thomson Reuters information assets that follow the semantic Linked Data standard. These dynamically generated pages also provide relevant ‘sameAs’ links to key resources in the Linking Open Data cloud.
You can see these by entering the text of a news article into our demonstration tool at http://viewer.opencalais.com, copy and paste in a news story that features a number of company names, hit submit, and view the extracted entities, facts and events in the left hand rail. Then expand on the companies, and click on one to find the Calais asset. (For instance, see the Bank of America asset here: http://d.opencalais.com/er/company/ralg-tr1r/e80e12df-622c-3c3e-86dc-a3ffdcc39e25.html (Traditionally, we have also included sameAs links to DBPedia and Freebase, and those will be back. Right now we are adapting to a new format.)
RDF: While the general developer population may lack familiarity with RDF at this time, as you note you do, developers that work with large textual content sets are moving to learn it now. While a variety of alternate representation ideas are available - RDF is the W3C standard and provides the right transport layer for rich knowledge representation. The text/simple format you chose is designed to support simple tagging / entity extraction use cases and leaves much of the richness extracted behind.
Length: OpenCalais supports entry of text documents up to 100K in length. We’ve found this supports the vast majority of our users well while conserving systems resources.
Usage: We welcome the use of OpenCalais for commercial or non-commercial purposes and allows users to submit up to 50,000 documents per day at no charge. After that we need to discuss some sort of value exchange.
We could probably name several other tools - but you absolutely should include Zemanta in any entity extraction test case.
Again - thanks for putting in the time to compare tools. I’d encourage you to come back and revisit the subject in the future.
Regards,
Tom Tague, OpenCalais Initiative Lead
(@TomTague) -
21:18 Voting is now open for ReadWriteWeb's Top 10 Web Products of 2009
» OpenCalais - Official BlogBig news!
OpenCalais has been nominated by ReadWriteWeb as among the top Web products for 2009.
Please add your voice to the mix by voting in ReadWriteWeb's Top 10 Web Products competition, here: [www.readwriteweb.com]
You can select up to 10 nominees to support. We humbly request your consideration of OpenCalais.
Many thanks,
-Krista -
18:25 How Publishers are Using OpenCalais
» OpenCalais - Official BlogBased on some forum chatter, I realized that we haven't shared our latest publisher case studies - including The New Republic and Associated Content - here in the blog. Rather, we posted them out on Slideshare and linked to them in the newsroom.
So I think it's worthwhile to also link to that PPT here - which we developed for the U.C. Berkeley Media Technology Summit at Google - and to detail below some of the ways that publishers are using the RDF that the OpenCalais service returns.
Get Efficient • Streamline content ops to drive editorial productivity • Automatically categorize content with both IPTC news codes & ‘social tags’ that use everyday terms • Automatically tag the people, places, companies, facts & events in content • Automatically integrate archived materials Get Engaged • Improve search & navigation to make it easy for readers to find what they want. • Automatically populate recommendation widgets & related stories sidebars • Automatically create ‘topic hubs’ on trending issues & breaking news • Automatically integrate relevant data, related media, information from Wikipedia entries, etc. Get Smart • Optimize search engine ranking through better SEO. • Inform advertising placement and drive click-through • Improve syndication to search engines, news aggregators, ‘recommended reading’ apps., etc. Get Specialized • Triage content based on local relevance & impact • Triage content based on preferences or behaviors • Triage content based on topic, industry, special interests, perspective, etc.Please feel free to post any questions below and we will be happy to jump in.
Best,
-Krista -
2:37 Simple, high-level OpenCalais whitepaper
» OpenCalais - Official BlogWant to learn more about our connection to the Linked Data Cloud?
Check out our simple, high-level whitepaper, and let us know if you have questions.
@OpenCalais on Twitter
-
1:55 The OpenCalais Partner page has arrived!
» OpenCalais - Official BlogThe OpenCalais Partner page has launched, and - once again - we'd like to thank the innovative publishers and creative entrepreneurs who have embraced OpenCalais and made us part of their sites and services.
Not on the list, but think you should be?
Email us at Partners (at) OpenCalais (dot) com with two sentences on your app, site or service, and send along a quality image of your logo.
We'll be glad to add you to the list and include you in our next blog post or press release.
-
16:53 The List of OpenCalais Implementations Grows
» OpenCalais - Official BlogAdd 10 to the list of innovative sites and services that use OpenCalais to reduce costs, deliver compelling content experiences and mine the social web for insight. See our press release for more details on each.
We are thrilled to recognize the following new sites and services that are changing the way we engage with news and the social Web. They join a growing number of others in media, publishing, blogging, and news aggregation who use OpenCalais.
The newest publishers, joining CBS Interactive / CNET, Huffington Post, DailyMe and others in using OpenCalais, include:
The New Republic – http://www.tnr.com – The new website uses OpenPublish, an OpenCalais-enabled Drupal-powered Content Management System (CMS) to increase editorial productivity, improve search engine optimization, and drive reader engagement, including faceted search, recommended reading sidebars and – coming soon – automatically generated topic hubs.
Al Jazeera English’s new blogging network – [blogs.aljazeera.net] – features Al Jazeera correspondents from around the world. All posts in the new blog network are semantically tagged using OpenCalais for optimal search and navigation. It also uses a Creative Commons license and allows users to sign-in to comment using Facebook Connect, Twitter or OpenID.
Slate Magazine’s News Dots Network – [slatest.slate.com] – News Dots visualizes the most recent topics in the news as a concise network of related topics. Like a human social network, the news tends to cluster around popular topics, and most stories are more closely related than one might think. Behind the scenes, News Dots scans all the articles from major publications—about 500 a day—and submits them to OpenCalais to identify the relevant people, places, companies, topics, etc.
I *heart* Sea – http://iheartsea.com/ – is a hyperlocal news aggregation site that collects some of the best blogs in Seattle, especially those serving the Capitol Hill area. I *heart* Sea uses OpenCalais to automatically tag the keywords of the blog posts in aggregates, to make it easier to find related information.
Innovative new media monitoring and intelligence tools using OpenCalais include:
Tattler (app) – [tattlerapp.com] – is an open source topic monitoring tool for today's Web. Tattler finds and aggregates content from the Web on topics users ask it to monitor. Using OpenCalais and other Semantic Web technologies, Tattler mines news, websites, blogs, multimedia sites, and other social media like Twitter, to find mentions of the issues most relevant to users’ selected topics, making it easy for users to filter, organize, share, and take action on content gathered from the real-time Web.
Interceder – http://www.interceder.net – is a social media monitoring tool that makes it easy to track trending topics and search through the latest content from major news Web sites, blogs, Twitter and YouTube. Interceder uses the Daylife API, OpenCalais, Freebase, and Yahoo! Pipes to retrieve the latest news from major news websites.
AskJot – http://www.askjot.com – Ask Jot is a tool for analyzing web pages for keywords, and displaying them as links to search results from various services around the Web. Developed by John Wright of Wright Labs and formerly known as Semantalyzr, Ask Jot uses OpenCalais, The New York Times article search API, DBPedia, the Yahoo! Answers API, the flickr API and many more.
New services using OpenCalais to deliver intelligent content experiences include:
Feedly – http://www.feedly.com –This Firefox plug-in brings to life user-selected inputs from Google Reader, friendfeed, Twitter, RSS feeds and more in an easy-to-read and engaging magazine-style format. Feedly uses OpenCalais and other semantic technologies for clustering, linking and organizing the content experience in an intuitive fashion that is nicely integrated into the browsing experience.
OpenPublish – [www.opensourceopenminds.com] – Based on the popular open source publishing platform Drupal, OpenPublish is a next-generation CMS that has been tailored to the needs of today's online publishers (magazines, newspapers, journals, trade publications, broadcast and wire services). Developed by Phase2 Technology, it uses semantic metatagging from OpenCalais to streamline content operations, automatically create topic hubs and recommend related articles and archived ‘more from this author’ stories.
DocumentCloud – http://www.documentcloud.org – Founded by reporters from The New York Times and ProPublica, and funded by the Knight Foundation, DocumentCloud is a unique online resource that will offer public access to news reporters’ original source materials, including documents, media files and more. OpenCalais processes materials available through DocumentCloud to make it easy for users to explore connections between newsmakers, corporations, transactions and even quotations across documents and across the full collection of source information.
-
14:29 DocumentCloud Adds OpenCalais and 20+ Investigative Journalism Outfits
» OpenCalais - Official BlogGood news for fans of OpenCalais and investigative journalism alike.
The DocumentCloud initiative – winner of this year’s largest grant from the John S. and James L. Knight Foundation – has lined up some two dozen partners, everyone from Thomson Reuters, The Wall Street Journal and The New York Times, to the ACLU National Security Project, The National Security Archive, the Center for Investigative Reporting and many more.
DocumentCloud is a unique online resource – found at http://www.documentcloud.org - that will provide public access to news reporters’ original source materials. It will debut in a beta version by the end of this year.
Here are some of the stories that resulted from the news. Please help us share them with friends to let everyone know that this powerful tool for citizen journalism is on the way.
Nieman Journalism Lab: DocumentCloud adds impressive list of investigative journalism outfits.
[www.niemanlab.org]Journalism.co.uk: Thomson Reuters partners KNC winner DocumentCloud
The Knight Foundation: More news from DocumentCloud
NYConvergence: Newspapers, mags, non-profits partner with DocumentCloud
Editor’s Weblog: OpenCalais joins DocumentCloud; set to host a wealth of primary sources. [www.editorsweblog.org]
The New York Observer: The Atlantic, The New Yorker, Mother Jones, WNYC, More Join Data Archive Experiment DocumentCloud.
[www.observer.com] -
18:41 10 Ways to Use OpenCalais Today
» OpenCalais - Official BlogHere’s what’s up. Over the last several months you may have read about partnerships between OpenCalais and organizations like CNET/CBSi, The Huffington Post, Magus, Associated Content, and a variety of others. And we, of course, think those are great.
But, we realized we’re not doing a good enough job of sharing some of the basic ways that people are getting value out of OpenCalais. We have the great opportunity to talk to a wide range of individuals and organizations every week and we think about this stuff – a lot – ourselves. So – the purpose of this blog entry (and more to come) is to just simply start sharing our thoughts and what you can do with OpenCalais. We can’t share everything (NDA’s and all that) – but we can share a lot.
This post is going to cover two things: 1) What OpenCalais does at a ridiculously high level and 2) a few starter ideas for what you can do with it. Follow-on posts will cover additional ideas and we’ll try to keep them coming at a good clip.
What Does Calais Do?Several things actually.
- It analyzes text you send it and extracts entities (people, organizations, geographies, etc.). In many cases, it links those entities to the world of Linked Data.
- It extracts facts – like the fact that John Doe is the CEO of Acme Corporation or such.
- It extracts events – like mergers, earning announcements, natural disasters and a bunch of others.
- It attaches a topic to the text as a whole, much like a newspaper would (Sports, Finance, Health, etc.).
- It creates SocialTags – our attempt to “tag” the article a way a human would to file it away somewhere.
There is a whole lot more going on in the background – but those are the basics.
Why is OpenCalais Unique?- First, there are a lot of entity extraction tools out there. Some are good – some not so great. Entity extraction is fine (if a little mundane) – but the real power of understanding text comes from understanding Facts and Events. Calais is the only tool that does that well. It may be the only tool that does that at all.
- Second, it is high performance. We process millions and millions of transactions every day for a diverse set of users. On average they take less than 0.75 seconds to complete.
- Third, it’s free for up to 50,000 submissions per day for commercial or non-commercial purposes. If you need more or want an SLA we have commercial options and have deals in place with clients who process millions of transactions per day.
- Fourth, you can count on it being here. OpenCalais is provided by Thomson Reuters – the world’s largest content company. We’re not going anywhere – so you can feel confident building a business or solution on top of OpenCalais technology.
We’re not going to try and structure this as use cases for different groups (publishers, bloggers, museums, etc.) – rather we’re just going to talk about the general types of things you can do with OpenCalais. You’ll have to do the translation to your situation on your own. We’re also not going to put out the complete technical recipe – just the general concept. And – no smoke and mirrors – this is all stuff you can do today.
The first round…
Triage — A simple use that saves time and money. If you’re faced with a large influx of content (say press releases) and you have a staff reviewing them for material that might be relevant to you – OpenCalais can probably help. Send the document to OpenCalais, get back the metadata and apply a few business rules. If you only care about mergers and acquisitions, then filter for that type of event and throw the rest away. We have real-world cases where this has reduced the volume of material to be processed by 60% and improved the accuracy of results by 10%.
Workflow — Triage with a twist. Use the metadata returned by OpenCalais to route documents to the right person and/or system based on the facts and events inside the document.
Content Enhancement — There’s a whole world of Linked Data out there and OpenCalais can be your entry point. For example – take in press releases, and extract the companies mentioned in them. Use OpenCalais’ Linked Data entry points to get the SIC codes and the link to DBPedia. Access DBPedia and enhance your content with other information about the company like locations, people, products. Access Geonames to figure out what region the company is located in. Take that enhanced content and do cool things (like triage and workflow and presentation) with it.
Alerting — Give users the ability to be alerted when certain types of content becomes available. Unlike simple keyword alerting with OpenCalais + Linked Data you can construct alerts like, “Tell me when there is M&A activity for a company in the Steel industry.”
Media Monitoring — Whether you’re a media monitoring company or do it for your own company – it just got easier. Take in a content feed (social media, press releases, news), use OpenCalais to categorize and organize it, – put the results in a database and set some trigger levels.
Content Harmonization — Are you managing content from diverse sources? OpenCalais can take content from multiple sources (different news feeds, different museum collections, etc.) and apply a consistent set of metadata tags to all of it. Pop it in your content management system and you can treat it all as one harmonized content asset.
Automated News Portals — Want to create a general purpose news portal? Or maybe one that deals only with baseball news? Great. Subscribe to and/or acquire some content sources, and feed them through OpenCalais. Then use the metadata to throw away what you don’t care about and to organize the rest by topic, geography, person – whatever. A great example of an off-the-shelf solution that does this is OpenPublish.
Finer-Grained / Higher-Value Syndication — Do you have content consumers via RSS or other syndication methods? Give them a better experience by allowing them to create their own channels based on OpenCalais metadata. Create channels based on region, types of events, companies, etc. – or any combination of those and other items.
SEO — Something we get asked about all the time – we know people are experimenting – but they’re not being very public about their experimentation. Here’s a simple idea though: make your content more search friendly. Two routes: One easy, one a little harder.
Route 1: Translate events into human readable text and get it on your page. Have a complicated article about an LBO of company x by people y? OpenCalais will identify an M&A event. Take that event and turn it into a tag like “Acquisitions” – something people might actually search for. Don’t just use it as a metatag – incorporate into the page via navigation or whatever so Google pays attention.
Route 2: Use linked data to enhance your content. If you’re talking about a company or geography use OpenCalais Linked Data to enhance the page with additional information from Dbpedia, Geonames, CIA world fact book or a bunch of other sources.
New Presentation Metaphors — With consistent metadata extraction across your content you can implement new navigation and search tools. Two examples. The Powerhouse Museum (here’s an example) tags everything in their collection and shows them as search terms. Some interesting insights emerge. Second example: Slate Magazine processes the day’s news and creates a network diagram of what’s connected to what (here it is). Pretty interesting. Our recommendation here: unless your audience is researchers, start simple and expand – it’s easy to overwhelm the average user with novel graphics, etc.
Looking AheadThat’s just a start. We’ll keep publishing this entry with new ideas as we get the time. There are about 15 additional ones on the list today that we haven’t covered here – and more will be coming in the next few weeks.
And remember – these are building blocks. Mix and match them to create something really cool, and let us know what you’re doing.
-
13:44 Welcome to September!
» OpenCalais - Official Blog
We have two pieces of news today: the first is just plain exciting (smart people getting creative with OpenCalais) and the second will make enterprise folks’ work with OpenCalais a lot easier. 1.) Associated Content, the NYC-based open content network has joined as an OpenCalais partner. They are using OpenCalais to inform ad placement, providing readers with a more contextually relevant experience, and – in a new twist – to assign the right story to the right reporter. See the Adweek story here: [bit.ly] and see the full press release here: [bit.ly] 2.) Oracle has integrated OpenCalais into the new Oracle Database 11g Release 2, which debut they announced this morning. Specifically, OpenCalais is a seamless option in the Spatial RDF portion of the new database, which is a scalable, secure and reliable metadata management platform. See the ZDNet story here: [bit.ly] and see the full press release here: [bit.ly] If you are an Oracle user and want to learn more, see our Oracle FAQ here:  [bit.ly] Remember to join us on our new Facebook ‘Fan Page,’ as we will close down the old Facebook 'Group' at the end of this month. (Click here: [bit.ly] )You can also join us on Twitter:
[www.twitter.com]
[www.twitter.com]
[www.twitter.com]
[www.twitter.com]Best,
-Krista -
16:29 Increased Transaction Allowances
» OpenCalais - Official BlogHi all – just a little snippet of news today.
We’ve been busy working to keep up with the increased demand for Opencalais services. A lot of engineering going on in the background around scalability, performance and reliability.
The good news is that it is all paying off. Over the last three months we’ve achieved 100% uptime while improving performance and a delivering a number of functional improvements in the service - all while seeing growing utilization.
One result of our recent work is that the system is more efficient and scalable. Given our goal of connecting the world’s content – scalability is more than a little important.
And we’d like to share some of those improvements with the OpenCalais community. While we can’t offer everyone unlimited usage – we can turn the dial up a bit. So – effective immediately we’ve increased the daily transaction allowance for OpenCalais to 50,000 transactions per day – a 25% increase.
Of course, OpenCalais continues to be offered at no charge for commercial or non-commercial use. You don’t need to serve ads, embed links you don’t want or anything else. All we ask is that you share your stories of how you’re using OpenCalais with the world.
We’ve signed quite a number of agreements for OpenCalais Professional recently (same service, more transactions and an SLA). Based on what we’re hearing from our users we’re also working on some new pricing models for startups and (enormously) high volume media monitoring applications – more to come soon. If you’re interested in our Professional solutions please drop us a note at professional@opencalais.com and we’ll get back to you right away.
-
1:22 The Y Combinator Future of Journalism Challenge Spells Opportunity
» OpenCalais - Official BlogThe Y Combinator's 'Future of Journalism' challenge presents a great opportunity for OpenCalais developers. The seed-stage venture group has turned the conventional approach to journalism on its head, calling for sustainable online business ideas that can also support journalistic endeavors.
They ask: What would a content site look like if you started from how to make money—as print media once did—instead of taking a particular form of journalism as a given and treating how to make money from it as an afterthought?
Read more about it here, and remember how OpenCalais can help:
OpenCalais uses natural language processing (NLP) to “read” an article, extracting the ‘who, what, when, where and how’ from the story. Breaking content down into its basic elements makes it easier to manipulate – automating the creation of topic hubs and microsites – and improves its search relevance. With OpenCalais, publishers can:
- Automate: Automatically tag the entities, facts and events in content to increase its value.
- Enhance: Enrich content with open data from Wikipedia, Shopping.com, Geonames and more.
- Engage: Optimize the user experience, increase engagement and drive repeat visits.
- Extend: Increase reach to new search engines, aggregators, ‘related stories’ apps and more.
- Connect: Compete in tomorrow’s media ecosystem of enriched and interconnected content
-
4:01 OpenCalais version 4.2 Available Now
» OpenCalais - Official BlogWe noted in our OpenCalais 4.1 press release that version 4.2 would follow swiftly, and here it is! The upgrade, which began rolling out yesterday, is automatically available to all OpenCalais users - with no changes required on your part..
As Tom noted in his original 4.1 blog post, the 4.2 improvements include:
New Granularity in News Categories (4.2)We’ve added a number of additional news categories. Our news topic categorization capabilities have been expanded and now map quite closely to 17 top levels of the IPTC Newscodes.
OpenCalais entidad extracción en español (4.2)OpenCalais now supports entity extraction in Spanish. The detailed list of supported entity types are covered in the release notes.
Linked Data Breadth, Depth and Access (4.2)We’ve significantly upgraded the Linked Data URIs for Company and the content is refreshed more frequently.
If you’re investigating Linked Data – particularly around Companies – you’ll find the changes especially useful. With one call to OpenCalais and a few HTTP fetches you can build a complete picture of a company, its industry, its competitors, its location and many other items.
We’re also exposing our Linked Data endpoints in a new format: JSON. In addition to HTML and RDF you can now retrieve companies, geographies and any other Linked Data URI as JSON by appending .json to the URI or calling us with an appropriate caller type.
-
15:37 New Version of Tagaroo Available
» OpenCalais - Official BlogThe latest version of Tagaroo is available here.
Below are some of the new and important features of this new release
- Social tags are suggested as tags and appear at the top of the tags list in bold.
- Document categories are suggested as tags and appear at the top of the tags list.
- Many new entities and events are also suggested as tags. We have changed the descriptive text (what you see as a suggestion) for events, which unifies the suggested tags, so that, for example, 'Trial', 'Conviction' and 'Arrest' appear as 'Judicial Event' instead of as three separate tags.
- Entity tags are sorted by their relevance scores. You can control the 'Relevance Score' threshold from Tagaroo's settings, so that entity tags with a lesser relevance score do not appear as suggestions.
Check out the newly released Tagaroo and see how it can enrich your Wordpress blog.
-
12:53 OpenCalais Release 4.1 Available Today
» OpenCalais - Official BlogOpenCalais Release 4.1 Available Today The Gist:- Introducing Social Tags – a knowledgebase-driven tagging solution
- Entity extraction now supported in English, French and Spanish
- Significant improvements to Linked Data depth and breadth
- Introducing the “Recession Pack” of topical fact and event extraction
Over the last several months we’ve been hard at work in the boiler room doing a fair amount of engineering work on OpenCalais. While not exciting – we’ve made significant improvements to the system’s reliability and scalability.
Now it’s time to release some user-visible enhancements. OpenCalais 4.1 is released today and Release 4.2 will follow in just two to three weeks. Unlike our last several releases, these will be rolled out as simple updates to the existing web service. If you’re not interested in any of the new features – no changes are required to your application.
Here’s what’s coming in Release 4.1 and 4.2:
Folksonomies, Ontologies, Vocabularies and Stuff (4.1)OpenCalais is a great semantic data extraction engine. If you write an article about the relative merits of Porsche and BMW at the test track in Leipzig, we’ll diligently identify Porsche and BMW as companies and Leipzig as a geography. We’ll create Linked Data URIs to represent these things and open up access to the Linked Data ecosystem so you can enhance your article with other content assets.
But… sometimes you just want a great description. The kind of tags a human would put on the article. Like “Car racing” or “Automobiles”. The kind of tag that would, for example, be very searchable and therefore …. SEO’able (that is definitely is not a word).
In 4.1 we’re introducing OpenCalais Social Tags. Social Tags is our attempt to emulate how a human might tag the document. Social Tags does some fairly sophisticated analysis of your entire document and maps it to a knowledgebase based on Wikipedia and other assets. From that process we generate Social Tags.
We’d suggest you experiment with using them for content tagging and navigation – we’d also really like to see some experimentation around using the Social Tags as keywords for ad placement and meta-tags for HTML pages. Sounds like an opportunity for SEO and improved ad placement to us.
Because it’s a new approach Social Tags is going to require ongoing refining and tuning. You can expect some strange results in its first few months out in the real world. When you see them – we’d love to hear about them in our Forum.
New Granularity in News Categories (4.2)We’ve added a number of additional news categories. Our news topic categorization capabilities have been expanded and now map quite closely to 17 top levels of the IPTC.
OpenCalais entidad extracción en español (4.2)Beginning with release 4.2 OpenCalais will support entity extraction in Spanish. The detailed list of supported entity types are covered in the release notes.
Linked Data Breadth, Depth and Access (4.2)We’ve significantly upgraded the Linked Data URIs for Company. The content is refreshed more frequently, company competitors are cross-linked to the appropriate OpenCalais Company URI and links to new information sources such as CrunchBase are now included. If you’re investigating Linked Data – particularly around Companies – you’ll find the changes especially useful. With one call to Opencalais and a few HTTP fetches you can build a complete picture of a company, its industry, its competitors, its location and many other items.
We’re also exposing our Linked Data endpoints in a new format: JSON. In addition to HTML and RDF you can now retrieve companies, geographies and any other Linked Data URI as JSON by appending .json to the URI or calling us with an appropriate caller type.
Opt-In Publishing of Document Metadata URIs (4.1)By default OpenCalais stores all document level metadata as a Linked Data URI and makes is accessible via a (secret) identifier. This is useful if you want to share document metadata with someone else by providing them with a list of URIs rather than a massive file. Beginning with Release 4.1 this will shift from our default behavior to an opt-in function. No reason to make many millions of document-level URIs accessible if you don’t plan to share them later.
New and Enhanced Events and Facts (4.1)Given the current…. environment many of our new events and facts focus on company performance and actions. We’ve added a wide range of event types including company accounting changes, labor issues, layoffs, earnings restatements, delayed filings and quite a few others. Go wild. We’re calling it the OpenCalais Recession Pack – and we hope it proves quite useless in the near future.
Bug FixesYes, it’s true. We have had some bugs. Release 4.1 improves accuracy and reliability of a wide range of extractions and addresses a few specific processing errors we’ve discovered.
WrapRelease 4.1 is a small but important milestone for OpenCalais. Over the past 18 months we’ve dramatically expanded the range and improved the quality of our entity and event extraction capabilities. We’ve achieved our goal of providing the highest-quality entity extraction toolkit available and done it at a scale that’s surprised even us. In addition to entity extraction we’ve invested in the heavy lifting necessary for sophisticated fact and event extraction – which sets the stage for a whole new class of significantly more sophisticated semantically enabled applications.
With the release of Social Tags we are taking this capability into a whole new arena. While our current Social Tags database is based on Wikipedia and other public assets, we don’t plan to stop there. Social Tags is a general-purpose solution that can apply to any knowledge domain. As we move forward we’ll be investigating opportunities to provide tagging in areas ranging from economics to environment to politics. Social Tags is just the first step down an exciting path.
Documentation for the new release has been updated and is located here. The release notes can be found here.
-
17:36 It's time to check your applications' compatibility with OpenCalais 4.1
» OpenCalais - Official BlogWe’ll be releasing version 4.1 of OpenCalais on Monday 15 June (and Release 4.2 about three weeks later!). This release has a number of new features – but for the moment we want to focus on making certain it is backwards compatible with your existing applications.
Beginning now, version 4.1 is exposed for compatibility testing at the API access point of http://beta.opencalais.com. Just point your application to this address, make sure everything works and then point back to the default http://api.opencalais.com.
This beta will be available until the morning (EST) of June 15. At that point we’ll be rolling out the upgrade of OpenCalais across all of our server locations, and the beta servers will be retired.
What new in 4.1? Well – that’s a secret until we release on June 15. At that time we’ll publish another blog post, updated documentation and provide release notes. A few hints: a new language, social content tagging and semantic metadata uniquely tuned to current events.
-
0:11 Publishers Tap OpenCalais for Competitive Advantage
» OpenCalais - Official BlogOn Thursday, we announced that CBS Interactive / CNET is our first major media commercial partner for OpenCalais. You can see the press release here.
CNET is using OpenCalais for semantic analysis of its tech product reviews, award-winning news, and blog postings on consumer electronics and technology, where it will streamline CNET’s content operations, drive audience engagement, and further extend CNET’s reach across the Web.
In addition, CNET is joining us in publishing core data assets for public, programmatic use on the Linked Data cloud. CNET will allow its original content -- such as tech product reviews on laptops, TVs, smart phones, and digital cameras; news articles and blog posts from its CNET News editorial staff; and parts of its core technology product catalog - to be available for public use.
You can read more about CNET’s pioneering work in the ReadWriteWeb write-up.
You can also read how Huffington Post and DailyMe are using OpenCalais to create local microsites and offer greater personlalization, respectively. DailyMe's Neil Budde offers more detail in this Now Possible story as well.
Let us know how you are using OpenCalais in time for SemTech 2009 (June 14 - 18) so that we can showcase your work!
-
1:57 PwC Technology Forecast: Spring 09
» OpenCalais - Official Blog
Click here to find an excellent, professional report from PriceWaterhouseCoopers (the PWC Quarterly Technology Forecast) on the impact of Semantic Web and Linked Data – primarily with an enterprise vs. web focus.
It's well worth a read and sharing with others.
-
19:01 Calais Supports Google's Rich Snippets Metadata Harvesting
» OpenCalais - Official BlogInteresting news yesterday from Google. Basically they’ve announced that their new Rich Snippets feature will harvest semantic metadata from web pages using Microformats or RDFa.
As is to be expected – Google is a little uncommunicative about whether they’ll harvest this from every web page that has it and exactly how they’ll use this information in displaying search results (or advertising) – it’s still an important development. It’s important because – if they do something interesting with the content – it will provide the incentive for several million websites to begin embedding semantic metadata – which we think is a good thing.
Some time ago OpenCalais released Marmoset – a tool designed to provide automated microformat generation for the Yahoo crawler. Marmoset sits quietly and waits for Yahoo to come along. When the crawler arrives Marmoset tags the page using OpenCalais and inserts microformats at the end of the page for a subset of the detected entities. It’s not a perfect solution – the microformats should really be implemented in-line with the content – but it does serve the purpose of feeding Yahoo the metadata at no additional coding cost to the content publisher.
Today we’re releasing an updated Marmoset that does the same thing for Google’s Rich Snippets. The tool and documentation are located here.
In the future we’ll look at more elegant solutions that embed RDFa directly in-line with the content. For the time being, Marmoset is a no-cost mechanism for jumping on the Google Rich Snippets platform.
Tom
-
16:09 Calais Supports Google Rich Snippets Metadata Harvesting
» OpenCalais - Official BlogInteresting news yesterday from Google. Basically they’ve announced that their new Rich Snippets feature will harvest semantic metadata from web pages using Microformats or RDFa.
As is to be expected – Google is a little uncommunicative about whether they’ll harvest this from every web page that has it and exactly how they’ll use this information in displaying search results (or advertising) – it’s still an important development. It’s important because – if they do something interesting with the content – it will provide the incentive for several million websites to begin embedding semantic metadata – which we think is a good thing.
Some time ago OpenCalais released Marmoset – a tool designed to provide automated microformat generation for the Yahoo crawler. Marmoset sits quietly and waits for Yahoo to come along. When the crawler arrives Marmoset tags the page using OpenCalais and inserts microformats at the end of the page for a subset of the detected entities. It’s not a perfect solution – the microformats should really be implemented in-line with the content – but it does serve the purpose of feeding Yahoo the metadata at no additional coding cost to the content publisher.
Today we’re releasing an updated Marmoset that does the same thing for Google’s Rich Snippets. The tool and documentation are located here.
In the future we’ll look at more elegant solutions that embed RDFa directly in-line with the content. For the time being, Marmoset is a no-cost mechanism for jumping on the Google Rich Snippets platform.
Tom
-
2:51 Hope to see you at Web 3.0 & SemTech 2009
» OpenCalais - Official BlogJust a quick note to say we hope to see you at one of two key industry conferences this spring. Here's what we're up to in May and June:

1.) The Web 3.0 conference takes place in New York City at the New Yorker Hotel on May 19 & 20, 2009.
We will be offering a media industry and publisher-centric presentation on day one of the show. We look forward to catching up with everyone on the east coast, and hearing about the semantic and Linked Data projects folks have been working on over the past year.
As the Web 3.0 conference producers note, the time is ripe for innovation: "In turbulent economic times, it is critically important to understand what opportunities exist to make our businesses run better."
We'll look to offer some hands-on ideas for how OpenCalais can help you do just that.

2.) The 5th Annual Semantic Technology Conference - aka "SemTech 2009" - takes place at the Fairmont Hotel in downtown San Jose, June 14 -18, 2009.
We will help to kick-off this conference by sharing key insights, inspirations and ideas - as well as areas that demand greater innovation & collaboration - gleaned during OpenCalais' first year out.
We'll also offer our first ever 'OpenCalais user track' at SemTech, which will start on Tuesday, June 16th at 2 p.m. Here is the line-up as it stands today; the final version will be appearing on the SemTech site soon.
2:00 p.m. Learn how top publishers are using semantic technology to improve their sites and services, including the New York Times, Huffington Post, CBS Interactive and more. 3:15 p.m. Learn how to bring semantic technology into Drupal sites and get a live demo of the OpenPublish platform with Jeff Walpole & Frank Febbraro, Phase2 Technology. 4:30 p.m. Learn what’s new in Calais, and how to use it to extract metadata and harvest the Linked Data cloud with Tom Tague, Calais Initiative Lead & James Leigh of James Leigh Services Inc.Let us know if you plan to be at either show - we'd love to meet face-to-face.
-
13:32 Check out the Resolved Entity Analyzer
» OpenCalais - Official BlogDevelopers - we've posted information about the Resolved Entity Analyzer in the Showcase. It analyzes entities and outputs in an xml file containing simplified data gathered from OpenCalais and Linked Data.
Check it out here.
-
19:04 Happy days are here again...
» OpenCalais - Official BlogAs you may have noticed we've had some intermittent stability and response time issues over the last few days. We believe we have the problem solved - or at least quarantined.
We have one particularily high volume user who submits a very wide range of content types to us. Due to some errors in the way they were using the API and some errors in the way we were handling errors (whew..) - we were seeing system utilizations that were off the chart.
We've moved that user to their own little quarantine island until we get things worked out with them. As soon as we did this - the remainder of our servers paused for a moment, took a deep breath, and then went back to almost idle - where they belong.
So - things are looking good. We'll continue to keep a close eye on the system and make sure things stay settled down.
We learned a few things about how to debug these types of errors and will be faster in the future.
Thanks all for your patience.
Tom

