Skip to content

Yaron’s “Semantic Minds” interview

I got interviewed by Lydia Pintscher of ontoprise as part of their ongoing monthly Semantic Minds interview series, of people connected to Semantic MediaWiki. The interview was just published today; you can read it here.

In the interview I discuss WikiWorks, Semantic MediaWiki, Semantic Forms, Referata, Google, Wikipedia, KeyGene, Microsoft SharePoint, karaoke, collective intelligence and data silos, among other topics; along with a perhaps overly-aggressive attempt to “keep it interesting”.

Categories: Uncategorized.

Tanatopedia launched


We are proud to announce (a few days late) the public launch of Tanatopedia, a new Spanish-language wiki about death, and its presence in culture, religion, the funeral industry, etc. The wiki is sponsored and run by Serveis Funeraris Integrals, a Spanish/Catalan funeral-services company, and was implemented by WikiWorks, with the work done by David Gómez and Enric Senabre Hidalgo. The wiki makes extensive use of Semantic MediaWiki and related extensions.

Tanatopedia seems to be the first wiki of its kind, and has already gotten some pretty extensive publicity within Spain – here’s a
nice article about it (Spanish-language) in La Vanguardia, a major Spanish newspaper.

UPDATE: David and Enric add the following:

Tanatopedia is funded and promoted by Serveis Funeraris Integrals (SFI), one of the most important funerary companies in Spain (located in the Barcelona area), but is open to other funerary companies, civil associations, scholars and anyone interested. The project won an internal projects call for innovation and comes from an original idea of Fidel Martínez, SFI employee and wiki-enthusiast.

We at Wikiworks were asked to collaborate in the project development implementing Semantic MediaWiki technology, working together with Fidel to make Tanatopedia possible. We have worked specifically on the data structure and forms for entering information about funeral services, infrastructures, companies, religious options (and their requirements in funerary issues) and related artwork.

We made this information to be stored in various semantic properties and have created section pages for each of these issues, that dynamically display the contents when added. Through the Drilldown extension we made it also possible to explore the content and filter the pages related to these properties.

Since last week Tanatopedia is open; waiting for contributions.

Categories: Uncategorized.

Semantic MediaWiki’s new look


I’m late with this news, but the beginning of this week saw a visual overhaul for Semantic MediaWiki: a new front page, a new logo (seen here, along with the old one), and a corresponding change to the skin and some of the other pages on semantic-mediawiki.org, to fit in with the new look. The new front page was the result of a long-running discussion among some of the Semantic MediaWiki developers, myself included, about how the SMW site could try to look more professional, more informative, and could be more geared toward marketing to potential new users. The two most important changes are the new big “Download” button, and the “Wiki of the month” feature, which is planned to show a different wiki every month, and which we hope will become over time a nice gallery of use cases. Plus, it’s a good way for SMW-using wikis to publicize themselves (apply here!).

I wasn’t involved with redoing the logo, but I definitely support the new look. The new logo neatly manages to suggest a whole host of things: a reference to the MediaWiki logo (another flower), plus structure, links, interaction with outside data (the bubbles in the background), and a little bit of Web 2.0 (the rounded border and the colors). And apparently the orange is supposed to match the orange of the main Semantic Web logo.


The old logo was clever in its own way, but it was very clearly based on the Wikipedia logo, and originated from a time when getting the software onto Wikipedia was the overriding goal of the SMW project. We’ve moved on a little bit, as they say.

Speaking of logos, I think the biggest thing the new front page is still missing is a list of notable companies and organizations that use the software (it’s a nice list already), whether that’s done as one of those big lines of logos or as a simple bulleted list. We’d need to get permission from various people in order to get such a thing, but I think it’s definitely worth the effort. And actually, the same thing holds true for the WikiWorks site – we’re just about at the point now when we could put together a nice list of clients. But that’s another story.

Categories: Uncategorized.

SMWCon in Amsterdam, the recap

The Fall 2010 Semantic MediaWiki Conference, or SMWCon, happened almost three weeks ago, but sadly I didn’t get to writing a recap until now. Although it’s turned out well, in that the videos from the conference were just uploaded yesterday, so now I can link directly to the talks, which is nice, since I don’t have to do as much describing.

First, my overall impressions: I’ve been to four Semantic MediaWiki-related gatherings now (Boston, Karlsruhe, Boston again, Amsterdam), and I can definitely say that each event has been more serious and more focused than the last one; which is due to the increasing maturity of the technology, and increasing awareness. This one was two full days of talks about serious corporate usages, and SMW-based software that’s working and in use, with very little time lost between talks. It was sort of a relentless barrage of information, which was nice: even the lunch breaks had semi-structured discussions. At the end, there was discussion about turning SMWCon into a three-day event, to better handle all the interest, and to possibly allow time for some development work.

For all the talks, it could be that the most impactful parts of the conference were the discussions that went on during the breaks, since there were a lot of relevant people and a lot to talk about. I know of a few different initiatives that may be happening as a result of talks that happened there. I’ll just mention one, which is the project to add better RDF/triplestore support to Semantic MediaWiki. The basic issue is this: SMW stores its data, and retrieves it, via the wiki’s standard relational database, which is usually a MySQL database. That works fine for most people, but since almost the beginning of the SMW project some people have been asking about being able to use what’s known as a triplestore instead: a database geared specifically for storing semantic triples of the kind that SMW deals in. Triplestores are superior to regular relational databases for handling triples because they allow for reasoning, inferencing and the standard academic stuff, something I’d known about for a while; but they also do faster querying, something I only found out recently.

For a while, there have been two solutions to at least let you export the data from Semantic MediaWiki into a triplestore, where it could then be queried by standard semantic-web tools: one was a built-in component of SMW, while the other was a (non-open-source) extension. But by some coincidence, in the four months before this last SMWCon, three more extensions, all open-source, were created to allow integration with a triplestore in one way or another, including one, LinkedWiki, that was released just two days before the conference. So there’s clearly been an increase in interest, and SMWCon was an ideal place to discuss next steps, especially since almost all of the developers of the different extensions were there (and the developers of one of the extensions, “SparqlExtension”, gave a talk at the conference about theirs).

The current plan, then, as I understand it, is to use the knowledge gained from the creation of these extensions, and probably some of the code as well, to add to SMW the option of directly using a triplestore to both store and query its own data; a solution that should appeal to the semantic-web geeks and the corporate bean-counters alike. That project now has its own page, “SPARQL and RDF stores for SMW”. I’m looking forward to seeing what comes out of it.

As for the talks – let me briefly list some of the talks I found noteworthy; though I encourage everyone who wasn’t there (and who’s interested in SMW, of course) to check out all the videos.

  • Markus Krötzsch gave an interesting keynote address on SMW: Past, Present and Future. Strangely, for all the discussions I’ve had with Markus and Denny (the main developers of SMW) I didn’t know the full story of SMW’s origins until this talk. It’s public knowledge that SMW was first proposed at the 2005 Wikimania conference in Frankfurt (i.e. the first Wikimania), but I didn’t know that the proposal, and the idea, came about only because Markus and Denny wanted to attend that Wikimania, which was close by, and wanted to present something interesting. I also didn’t know that the first sponsors of the development of the software were inspired to fund it by that very talk, or that Markus and Denny originally didn’t plan to do any of the coding. (Unfortunately for anyone looking to make a film version of the story, none of the people involved have since sued each other. :) ) It’s strange to think how everything could have turned out differently if it weren’t for that one conference. Actually, I’d say it’s an argument for holding Wikimania more often in the standard tech centers of North America and Europe rather than in more far-flung places; but that’s a subject for another day.
  • I gave a talk too, about random stuff relating to Semantic Forms, including my “Semantic Classes” proposal, which actually seemed to get renamed halfway through the talk to “Semantic Schemas”, after some audience feedback. I’m still very much set on the idea, though the name’s pretty much up in the air.
  • Joel Natividad, from TCG, gave a talk that was really more like a problem statement. (It was also probably also the best-put-together talk, featuring audio and video clips, etc.) The issue is workflow in SMW – being able to coordinate the actions of different components, to complete entire business processes. So, to take an example, a workflow could be: someone creates a document on a wiki, which then causes a manager to get notified by email; the manager then reviews the document, edits it and approves it; at which point some other application has a field modified in their database and does something accordingly. As Joel pointed out, it’s been an open question in the Semantic MediaWiki world for over two years now on how best to solve that problem; and now, his company has to implement a workflow solution for a system that has SMW as one of its components, so it’s no longer theoretical. The consensus of the talk seemed to be that the right answer is to use a 3rd-party workflow application, but which one to use, or how best to use it, remains an open question. I’ll be very curious to hear what solution they end up going with (maybe by the next SMWCon?), because that may well serve as a template for other projects.
  • Rudi van Bavel of KeyGene gave a nice talk on importing millions of rows of data into Semantic MediaWiki, and using the results. It was heartening for a lot of people in the audience to see, because he demonstrated that SMW can work well with that much data, and gave some tips on how to configure the database to lead to good results.

Note that I’m just mentioning the talks that especially struck me; there were other talks that were interesting, but of which I had already seen variations before, or knew the technology, so I’m not an ideal judge. On that note, let me plug the talk given by fellow WikiWorks member Jeroen De Dauw on the Maps and Semantic Maps extensions (he already uploaded it to YouTube). I’m very familiar with the technology, but if you’re not aware of it, it’s a real treat to see it in action.

Categories: Uncategorized.

Making Wikipedia into a database

Last month I was quoted in a Technology Review article about using Wikipedia’s data, and soon after that one of the editors of Hatilda Harevi’it (”The Fourth Tilde”), a Hebrew-language semi-monthly online Wikipedia-based newsletter, wrote me, asking me to write something for Hatilda clarifying the various ideas presented in there. I wrote something in English, which they dutifully translated into Hebrew (I can read Hebrew fine, but my writing leaves something to be desired). The latest edition, vol. 24, came out yesterday, with my column – you can see it here (look for “Semantic MediaWiki” :) ). It ended up being longer than I thought it would – it contained not just an overview of the concepts, but a technical proposal for Wikipedia.

And for the benefit of those of you who can’t read Hebrew, here’s the original version:

Making Wikipedia into a database

In July, the magazine Technology Review published an online article, Wikipedia to Add Meaning to Its Pages, about adding semantics to Wikipedia, that caused a little bit of a stir – I think for most people who read it, it was the first time they had heard about me, Semantic MediaWiki (SMW), or the consulting company WikiWorks (which is mentioned indirectly); for some, it may have been the first time they had heard of the Semantic Web. That article just provided a very brief summary of all the issues involved, so I’d like to give my view of things in more detail.

I see the history of Wikipedia as, in part, a progression from collection of text articles into something more like a database. As the amount of information in Wikipedia has grown, the structure needed to support it has grown alongside it – that’s an entire world of categories, infobox templates, navigation templates and list pages (which you can see taken to a logical conclusion, though not the most extreme one possible, with the English Wikipedia’s “Lists of lists” category). At the same time, the importance of Wikipedia as a source of data has also grown considerably. Three online projects, all mentioned in the article, are either completely or to a large extent based on using Wikipedia’s data: DBpedia, which puts the information from the English-language Wikipedia on the web in a format that computers can query directly; Freebase, which does something similar for information from many different sources, although Wikipedia is one of the largest; and Powerset (not “PowerSet”), which according to the article gets its Wikipedia information indirectly, via Freebase (I thought it built up its store of information by doing natural-language processing on the main text of Wikipedia articles – in either case, it’s based on Wikipedia). These projects have all done well for themselves – DBpedia is literally at the center of every graph of the world of “linked data”, Powerset was bought in 2008 by Microsoft, and Metaweb, the company behind Freebase, was bought by Google about a week after the article came out (I’m guessing that’s a coincidence).

This progression from text to data, by the way, reflects a larger overall trend in the web, a trend that people have generally referred to as the Semantic Web, and sometimes “Web 3.0″. The term “Semantic Web” has been used to mean many different things, and it’s itself the subject of controversy, but the very basic idea is that we should be able to have content from web pages accessed and understood directly by computers. If, for instance, I want to find the names of the 10 highest-paid actors who were born in Hungary, I should be able to enter my question into the computer in some way, and then have it go to the right sources for the different sets of information, put the information together, and give me back an answer. (Explanations for the Semantic Web often involve users finding plane tickets, but I thought I would give a more interesting example.) People have been talking about the Semantic Web since almost as long as there has been a web, but in the last five years it has really picked up, and now “Web 3.0″ is starting to see the same kind of hype that “Web 2.0″ once did.

So where that does leave Wikipedia? We’re at the beginning of some sort of online data revolution, and Wikipedia itself is the source for data projects worth tens of millions of dollars, yet Wikipedia’s own approach to data is quite basic – the same facts have to be manually entered by users over and over, at least once in every language and usually more than that. There is also very little ability to export any of the data in a machine-readable way. In short, it’s a wasted opportunity.

For those who want to improve access to Wikipedia’s data, one approach usually stands out: Semantic MediaWiki (SMW), which is an extension to MediaWiki (the software on which Wikipedia runs). It’s also a project that I’ve been involved with for four years. SMW is an extension that lets users easily store information found on the wiki within the wiki’s own database, so that that information can be queried, displayed (in tables, graphs, maps, calendars etc.) and exported elsewhere. I won’t describe SMW here in more detail than that, but if you want to read more about it, the FAQ is a good place to start. The FAQ mentions how Semantic MediaWiki in fact has its roots in a proposal for turning text into data on Wikipedia itself, and that getting SMW onto Wikipedia is still a major goal for some of its developers (though it was never a big goal of mine). Still, despite the Wikipedia connection, Semantic MediaWiki has taken on a big life outsid of Wikipedia, and at this point it gets serious usage as a data-management tool within companies, government agencies and other organizations (helping such organizations to use it effectively is the main business of WikiWorks).

That’s SMW, very briefly; but me let change directions here: I may surprise people who know me here by saying that I don’t believe that Semantic MediaWiki is the right answer for Wikipedia at the moment. The biggest reason for that is that Wikipedia is itself a collection of over 200 sub-sites, each in a different language; and a single store of data, that all of them can use, is probably a better solution than what SMW could provide, which is a separate data store for each one.

What would such a database look like, then? It would have to fit some general criteria: the data would have to be easily modifiable by people who speak many different languages; the data would have to be usable in many different languages; it would have to be extremely fast; and ideally it could be usable even outside of Wikipedia, as a general-purpose data API.

For those who are curious, and have some understanding of technical concepts like APIs and parser functions, I present, in the appendix, one option for how it could be done: it would involve creating a new wiki at a URL like http://data.wikipedia.org, that would hold thousands (or more) pages of raw data, probably in English; each set would be in CSV format (which stands for “comma-separated values”), the simplest format that data can take. All these pages would be created by hand, by users. The wiki could then be queried, using the URL, to get the contents of that data. Querying would be done by each language Wikipedia (the values would also get translated into the right language, an issue I talk about in the proposal), as well as by any outside system that wanted to easily get data from Wikipedia.

Is this a “semantic” solution? That depends on who you ask. For some people, “semantic” implies the use of very specific features: semantic triples, ontologies, and data formats like RDF and OWL. For me, that’s an academic discussion – all that really matters is finding a simple way to free up Wikipedia’s data for all sorts of interesting uses.

The appendix was kept untranslated, so you can see it here, for the full technical details.

Categories: Uncategorized.

Complex Operations, indeed

I found on Twitter (and we then re-tweeted) this video of Alper Caglayan explaining Semantic MediaWiki at the U.S. Naval Postgraduate School. I’m late on this – the talk happened over three months ago, and Dr. Caglayan has already blogged about the talk, and written up a blog post that summarizes his main points from the talk, in the meantime.

He apparently works for the company Milcord, and the SMW-based wiki he talks about for most of the time is the Complex Operations Wiki, which holds a lot of information about, among other things, the tribes and geography of Afghanistan. The potentials for the use of wiki are quite large, though, as far as I understand, the data isn’t being used by anyone right now – at the moment, it’s just a demo wiki, though its development was funded by people within the U.S. Department of Defense.

The video is nice (between around minutes 25 and 35 is the really important part) – it’s always interesting to see how other people present the technology, and how audiences perceive it when hearing about it for the first time. I think one of the obstacles for the SMW/Semantic Forms/etc. system, maybe the main one, is that it’s so different from other technologies out there that it’s hard to explain what it does, and thus how useful it is, in ways people can understand. Dr. Caglayan goes with the approach of calling it a wiki whose data can be queried – which is reasonable, but it doesn’t quite convey the experience of reading or editing the wiki. He shows a form in use, but doesn’t explain that the form in question didn’t require any custom programming to create. Then again, I know from firsthand experience that trying to explain the whole system takes a long time – at some point, I gave a full 8-hour seminar on SMW and its related extensions, and I once saw a 3-hour talk about it that barely covered anything but the basics.

It’s heartening, though, to see the fairly positive reaction of the audience, who are mostly civilians but who seem to play a role in the military’s data policies. The U.S. military has the same data problems as just about any mid-sized-and-larger corporation, from data “silos” to information that may have lost its validity at some point in the past. Hopefully Semantic MediaWiki can be part of the solution.

Categories: Uncategorized.

New MediaWiki extension: Approved Revs

Approved Revs is my latest MediaWiki extension (with some important code contributions made by Jeroen and others), released about a week ago; version 0.2 just came out today. It’s a simple extension, that just lets administrators mark a single revision/version of any wiki page as the “approved” one – so that, when users go to that page, what they see is the approved revision, not necessarily the latest one.

It’s a simple concept, and hardly original: you may be aware that there’s already an extension that does this – FlaggedRevs, which is already in use on a growing number of language Wikipedias; maybe a dozen currently – not yet the English-language one, but it’s probably just a matter of time. What’s different about Approved Revs is just its simplicity – FlaggedRevs puts in place an entire framework for evaluating the quality of any specific revision. That sort of framework approach makes sense for very large sites, like Wikipedia, where decisions about which version to approve have to be done out in the open, and agreed to by many people. For smaller sites, the framework of FlaggedRevs could be overkill – in fact, my first thought to create a new extension came after trying to install FlaggedRevs and getting scared off after around the 3rd paragraph of the documentation (though to be fair, that’s what some people have said about Semantic MediaWiki as well).

Anyway, I think Approved Revs will be an important extension, because it enables an element of workflow, something that MediaWiki has generally lacked. When you create a website with a standard CMS solution like Drupal or WordPress, you can easily save a page or posting in draft form before it gets “published”, i.e. made viewable to the public. And you can have different user types, so that one set of people is responsible for writing the content, and another is responsible for approving it. In MediaWiki it’s a different world: whatever the last thing that a user wrote, whether your user base is a small group of employees or the whole world, is what everyone sees. This of course offers a big advantage in immediacy, but for some organizations it’s just not acceptable. So Approved Revs could open up the use of MediaWiki to content-management situations where previously it wasn’t a possibility.

And yes, if the page contains semantic data, it’s the data from the approved revision that gets stored by SMW, which is great. (The same behavior could be true of FlaggedRevs – I don’t know.)

Categories: Uncategorized.

WikiWorks goes Rabbinic

Since this is my first post, I’ll start by introducing myself. I started my SMW consulting career with my own company named Tosfos Development, based in Queens, NY. When Yaron started WW and asked me to join, I jumped at the opportunity.

But that’s old news. In new news (That can’t be grammatically correct and if it is, it shouldn’t be. Hey, can I have a full sentence in parentheses smack in the middle of another sentence?) I am pleased to announce that in addition to my years of experience with the semantic technologies, I have just become an ordained Rabbi! Need a kosher wiki? Trim the pork from your budget! Got milk? Then don’t add meat! I’ll end here.

I guess this adds some more diversity to an already diverse team. We have consultants in three continents which means we cover half the world (subtracting Antarctica – poor Antarctica!). We can handle pretty much anything, such as custom extensions, skins, php, Semantic Bundle, wiki hosting and theological questions.

Wishing a happy Independence Day to my fellow Americans!

Categories: Uncategorized.

Six conferences

Working with MediaWiki, and usually Semantic MediaWiki as well, we find ourselves at the intersection of a number of different worlds: Wikipedia, corporate wikis, the Semantic Web, etc. To that end, we’ll be at several conferences this year and next – here are the known ones:

  • First, one from the past: the Spring 2010 SMWCon happened just about a month ago, at MIT in Cambridge, Massachusetts. I mostly put the conference together, including getting the catering (it was a challenge on a tiny budget, but we managed – the key is not to order too much fruit). There were about 30 attendees, and some quite in-depth presentations and discussions. Everything was videotaped, and though the videos aren’t online yet, I have reason to believe that they’ll be up quite soon.
  • The next SMWCon is already scheduled, and happening not long from now – it’ll be September 18-19, in Amsterdam. I’ll be there, along with possibly two other members of WikiWorks. This one looks to be a big deal, judging from the enthusiasm of the planners (not me, thankfully).
  • SemTech, the big annual semantic conference in California. None of us are there this year (it’s happening now), but I think we’ll try for next year.
  • Wikimania – the annual convention relating to all Wikimedia projects, but mostly Wikipedia – is coming up; it’ll be in Gdansk, Poland, in about two weeks. Unfortunately, I won’t be going, unlike the last two years – it just felt like too much, with everything else that was going on. But WikiWorks member Jeroen De Dauw will be there, and will be making several presentations, including one about his extensions Maps and Semantic Maps, and another about his very-eagerly-anticipated Extension Management Platform, that he’s working on now.
  • Speaking of Wikimania, the location of next year’s Wikimania was recently announced – it’ll be in Haifa, Israel next summer. It’s amazing to me that Wikimania is coming to Haifa – I grew up there in the early 1980s, I used to go back with my family fairly often after that, and the city and country are still very dear to my heart. I never thought my professional career would end up taking me back there; I plan to be there, if at all possible, and it may well be an emotional experience.
  • That brings us to RecentChangesCamp, aka “RoCoCo” (that’s the French name for it), which is happening this weekend in Montreal. It covers wikis in general, although judging from the attendees list it looks like at least half the people there will be from the MediaWiki world. I’ll be there, and I’m very much looking forward to it – it’ll be nice to talk to people from different parts of the wiki world – other people involved in development, hosting, design, and all the other things WikiWorks does on a daily basis, though not usually with much input from others. And Montreal is a great city – one of the nicest in the world, I think.

Categories: Uncategorized.

Vector-y at last

The big news today in the MediaWiki/Wikimedia world was that the default look for the English-language Wikipedia (i.e., the one shown to non-registered users, and to logged-in users who haven’t modified their defaults) changed – the logo was made smaller and a little darker, and, more importantly, the default skin was changed from “Monobook” to “Vector”.

I’ve personally known about Vector for about a year now, and six months I changed my own wiki, Discourse DB, to use the skin as a default, when I upgraded it to MediaWiki 1.16 (the current skin is basically just Vector done in olive-green). I personally think Vector has a fantastic look – very clean, with bigger, more obvious tabs, and an easier-to-find search area (in the top right, instead of lower down in the sidebar). That last decision, to move the search input, seems to have been a controversial one among Wikipedia users – if you read through the comments in the “new look” blog post, they’re full of pleas to restore the search input to its previous location, with one commenter helpfully stating that “The guy who decided to move the search box should be fired immediately and banished from the IT industry forever.” I’m not a usability expert, but it’s my understanding that the usability tests done by the team who created the Vector skin (many of whom I’ve met, actually) showed that the old location was hard to find for many new users – people have come to expect a search entry right at the top.

There are other complaints that seem more valid – like that parts of the sidebar show up as minimized and require you to click on an arrow in order to see them; which does seem pointless. Thankfully, this appears to be design decision unique to Wikipedia, and isn’t the default behavior of the Vector skin. There are also complaints about the font size, which I haven’t been able to duplicate – the font size seems the same on my screen as the old version.

Outside of Wikipedia, people seem to like Vector – already we see newly-launched wikis, like Startup Linkup and Innovation Cell, using Vector; and we’re talking now to a client who wants a custom skin, and who’s decided to go with Vector as the basis for it instead of Monobook. It should be noted that this is all happening even though the first version of MediaWiki in which Vector is included, 1.16, hasn’t been officially released yet – it’s been available for a long time, but it’s still in beta, and probably will be for another few months.

What will all this mean for those of us in the MediaWiki business? I think it’s great news – the overall look of MediaWiki has, in my opinion, been one of its weaknesses for a while; even though everyone knows the look from Wikipedia, it’s still been considered clunky, and difficult for people trying to edit for the first time to understand. The tremendous power and flexibility of MediaWiki have tended to outweigh some of the problems with the appearance. But now it looks – dare we say it? – nice.

Categories: Uncategorized.