Skip to content

Data is data

My new motto for 2013 is: “data is data”. What does that mean (aside from being a tautology)? It means that data has a set of behaviors and a “personality” all it own, very much distinct from its underlying subject matter, or from what format it’s in. Ultimately, the fact that a table of data holds an ID field, three string fields and a date, and that it has 300 rows, says more about how to display and interface with it than the fact that it’s an elementary school activity listing, or a description of video game characters, or top-secret military information, or the results of biotech research. And the fact that the data is in an Excel spreadsheet, or a database, or embedded as RDF in a web page, shouldn’t matter either.

What is data?

I should first define what I mean by data. I’m talking about anything that is stored as fields – slots where the meaning of some value can be determined by the name or location of that value. And it should be information that represents something about the outside world.

For that latter reason, I don’t think that information related to communication – whether it’s email, blog posts, microblogs and status updates, discussion forums and so on – is truly data, though it’s usually stored in databases. It’s not intended to represent anything beyond itself – there’s data about it (who wrote it and when, etc.), but the communication itself is not really data. Tied in with that, there’s rarely a desire to display such communications in a venue other than where it was originally created. (Communication represents the vast majority of information in social networking sites, but it’s not the only information – there’s also real data, like users’ friends, interests, biographical information and so on.)

Data can be stored in a lot of different ways. I put together a table of the different terms used in different data-storage approaches:

Database/spreadsheet Table Row Column Value, cell
Standard website Category, page type Page (usually) Field Value
Semantic MediaWiki Category Page (usually) Property, field, template parameter Value
Semantic Web Class Subject Predicate, relationship Object
Object-oriented programming Class Object, instance Field, property Value

What’s the most obvious observation here (other than maybe the fact that this too is a table of data)? That, for all of their differences, all of these storage mechanisms are dealing with the same things – they just have different ways to refer to them.

A wrapper around a database

The vast majority of websites that have ever been created have been, at heart, code around a database, where the code is mostly intended to display and modify the contents of that database. That’s true of websites from Facebook to eBay to Craigslist to Wikipedia. There’s often an email component as well, and sometimes a credit card handling component, and often peripherals like ads, but the basic concept is: there’s a database somewhere, and users can use the site to navigate around some or all of its contents, some or all users can also use the site to add or modify contents. The data structure is fixed (though of course it changes, usually getting more complex, over time), and often, all the code to run the site had to be created more or less from scratch.

Of course, not all web products are standalone websites: there’s software to let you create your own blogs, wikis, e-commerce sites, social networking sites, and so on. This software is more generic than standalone websites, but it, too, is tied to a very specific data structure.

So you have millions of hours, or possibly even billions, that have been spent creating interfaces around databases. And in a lot of those projects, the same sort of logic has been implemented over and over again, in dozens of different programming languages and hundreds of different coding styles. This is not to say that all of that work has been wasted: there has been a tremendous amount of innovation, hard work and genius that has gone into all of it, optimizing speed, user interface, interoperability and all of that. But there has also been a lot of duplicated work.

Now, as I noted before, not all data stored in a database should be considered data: blog posts, messages and the like should not, in my opinion. So my point about duplicated work in data-handling may not full apply to blogs, social networking sites and so on. I’m sure there’s needlessly duplicated work on that side of things as well, but it’s not relevant to this essay. (Though social-networking sites like Facebook do include true structured data as well, about users’ friends, interests, biographical information, etc.)

What about [insert software library here]?

Creating a site, or web software, “from scratch” can mean different things. There are software libraries that work with a database schema, making use of the set of tables and their fields to let you create code around a database without having to do all the drudgework of creating a class from scratch from every table, etc. Ruby on Rails is the most well-known example, but there’s a wide variety of libraries in various languages that do this sort of thing: they are libraries that implement what’s known as the active record pattern. These “active record” libraries are quite helpful when you’re a programmer creating a site (I myself have created a number of websites with Ruby on Rails), but still, these are tools for programmers. A programmer still has to write code to do anything but the most basic display and editing of information.

So here’s a crazy thought: why does someone need to write any code at all, to just display the contents of a database in a user-friendly manner? Can’t there be software that takes a common-sense approach to data, displaying things in a way that makes sense for the current set of data?

No database? No problem

And, for that matter, why does the underlying data have to be in a database, as nearly all web software currently expects it to be? Why can’t code that interfaces with a database work just as well with data that’s in a spreadsheet, or an XML file, or available through an API from some other website? After all, data is data – once the data exists, you should be able to display it, and modify it if you have the appropriate permissions, no matter what format it’s in.

It’s too slow to query tens of thousands of rows of data if they’re in a spreadsheet? Fine – so have the application generate its own database tables to store all that data, and it can then query on that. There’s nothing that’s really technically challenging about doing that, even if the amount of data stretches to the hundreds of thousands of rows. And if the data or data structure is going to change in the outside spreadsheet/XML/etc., you can set up a process to have the application keep re-importing the current contents into its internal database and delete the old stuff, say once a day or once a week.

Conversely, if you’re sure that the underlying data isn’t being modified, you could have the application also allow users to modify its data, and then propagate the changes back to the underlying source, if it has permissions to do so.

Figuring out the data structure, and other complications

Now, you may argue that it’s not really possible to take, say, a set of Excel spreadsheets and construct an interface out of it. There’s a lot we don’t know: if there are two different tables that contain a column called “ID”, and each one has a row with the value “1234″ for that column, do those rows refer to the the same thing? And if there’s a column that mostly contains numbers, except for a few rows where it contains a string of letters, should that be treated as a number field, or as a string? And so on.

These are valid points – and someone who wants to use a generic tool to display a set of data will probably first have to specify some things about the data structure: which fields/columns correspond to which other fields/columns, what the data type is for each field, which fields represent a unique ID, and so on. (Though some of that information may be specified already, if the source is a database.) The administrator could potentially specially all of that “meta-data” in a settings file, or via a web interface, or some such. It’s some amount of work, yes – but it’s fairly trivial, certainly compared to programming.

Another complication is read-access. Many sites contain information that only a small percentage of its users can access. And corporate sites of course can contain a lot of sensitive information, readable only to a small group of managers. Can all of that read-access control really be handled by a generic application?

Yes and no. If the site has some truly strange or even just detailed rules on who can view what information, then there’s probably no easy way to have a generic application mimic all of them. But if the rules are basic – like that a certain set of users cannot view the contents of certain columns, or cannot view an entire table, or cannot view the rows in a table that match certain criteria, then it seems like that, too, could be handled via some basic settings.

Some best practices

Now, let’s look at some possible “best practices” for displaying data. Here are some fairly basic ones:

  • If a table of data contains a field storing geographical coordinates (or two fields – one for latitude and one for longitude), chances are good that you’ll want to display some or all of those coordinates in a map.
  • If a table of data contains a date field, there’s a reasonable chance that you’ll want to display those rows in a calendar.
  • For any table of data holding public information, there’s a good chance that you’ll want to provide users with a faceted search interface (where there’s a text input for each field), or a faceted browsing/drill-down interface (where there are clickable/selectable values for each field), or both, or some combination of the two.

If we can make all of these assumptions, surely the software can too, and provide a default display for all of this kind of information. Perhaps having a map should be the default behavior, that happens unless you specify otherwise?

But there’s more that an application can assume than just the need for certain kinds of visualization interfaces. You can assume a good amount based on the nature of the data:

  • If there are 5 rows of data in a table, and it’s not a helper table, then it’s probably enough to just have a page for each row and be done with it. If there are 5,000 rows, on the other hand, it probably makes sense to have a complete faceted browsing interface, as well as a general text search input.
  • If there are 10 columns, then, assuming you have a page showing all the information for any one row of data, you can just display all the values on that one page, in a vertical list. But if you have 100 columns, including information from auxiliary tables, then it probably makes sense to break up the page, using tabs or “children” pages or just creative formatting (small fonts, use of alternating colors, etc.)
  • If a map has over, say, 200 points, then it should probably be displayed as a “heat map”, or a “cluster map”, or maps should only show up after the user has already done some filtering.
  • If the date field in question has a range of values spread out over a few days, then just showing a list of items for each day makes sense. If it’s spread out over a few years, then a monthly calendar interface makes sense. And if it’s spread out over centuries, then a timeline makes sense.

Somehow software never makes these kinds of assumptions.

I am guilty of that myself, by the way. My MediaWiki extension Semantic Drilldown lets you define a drill-down interface for a table of data, just by specifying a filter/facet for every column (or, in Semantic MediaWiki’s parlance, property) of data that you want filterable. So far, so good. But Semantic Drilldown doesn’t look at the data to try to figure out the most reasonable display. If a property/column has 500 different values, then a user who goes to the drilldown page (at Special:BrowseData) will see 500 different values for that filter that they can click on. (And yes, that has happened.) That’s an interface failure: either (a) those values should get aggregated into a much smaller number of values; or (b) there should be a cutoff, so that any value that appears in less than, say, three pages should just get grouped into “Other”; or (c) there should just be a text input there (ideally, with autocompletion), instead of a set of links, so that users can just enter the text they’re looking for; or… something. Showing a gigantic list of values does not seem like the ideal approach.

Similarly, for properties that are numbers, Semantic Drilldown lets you define a set of ranges for users to click on: it could be something like 0-49, 50-199, 200-499 and so on. But even if this set of ranges is well-calibrated when the wiki is first set up, it could become unbalanced as more data gets added – for example, a lot of new data could be added, that all has a value for that property in the 0-49 range. So why not have the software itself set the appropriate ranges, based on the set of data?

And maybe the number ranges should themselves shift, as the user selects values for other filters? That’s rarely done in interfaces right now, but maybe there’s an argument to be made for doing it that way. At the very least, having intelligent software that is aware of the data it’s handling opens up those kinds of dynamic possibilities for the interface.

Mobile and the rest

Another factor that should get considered (and is also more important than the underlying subject matter) is the type of display. So far I’ve described everything in terms of standard websites, but you may want to display the data on a cell phone (via either an app or a customized web display), or on a tablet, or on a giant touch-screen kiosk, or even in a printed document. Each type of display should ideally have its own handling. For someone creating a website from scratch, that sort of thing can be a major headache – especially the mobile-friendly interface – but a generic data application could provide a reasonable default behavior for each display type.

By the way, I haven’t mentioned desktop software yet, but everything that I wrote before, about software serving as a wrapper around a database, is true of a lot of enterprise desktop software as well – especially the kind meant to hold a specific type of data: software for managing hospitals, amusement parks, car dealerships, etc. So it’s quite possible that an approach like this could be useful for creating desktop software.

Current solutions

Is there software (web, desktop or otherwise) that already does this? At the moment, I don’t know of anything that even comes close. There’s software that lets you define a data structure, either in whole or in part, and create an interface apparatus around it of form fields, drill-down, and other data visualizations. I actually think the software that’s the furthest along in that respect is the Semantic MediaWiki family of MediaWiki extensions, which provide enormous amounts of functionality around an arbitrary data structure. There’s the previously-mentioned Semantic Drilldown, as well as functionality that provides editing forms, maps, calendars, charts etc. around an arbitrary set of data. There are other applications that do some similar things – like other wiki software, and like Drupal, which lets you create custom data-entry forms, and even like Microsoft Access – but I think they all currently fall short of what SMW provides, in terms of both out-of-the-box functionality and ease of use for non-programmers. I could be wrong about that – if there’s some software I’m not aware of that does all of that, please let me know.

Anyway, even if Semantic MediaWiki is anywhere near the state of the art, it still is not a complete solution. There are areas where it could be smarter about displaying the data, as I noted before, and it has no special handling for mobile devices; but much more importantly than either of those, it doesn’t provide a good solution for data that doesn’t already live in the wiki. Perhaps all the world’s data should be stored in Semantic MediaWiki (who am I to argue otherwise?), but that will never be the case.

Now, SMW actually does provide a way to handle outside data, via the External Data extension – you can bring in data from a variety of other sources, store it in the same way as local data, and then query/visualize/etc. all of this disparate data together. I even know of some cases where all, or nearly all, of an SMW-based wiki’s data comes from externally – the wiki is used only to store its own copy of the data, which it can then display with all of its out-of-the-box functionality like maps, calendars, bulleted lists, etc.

But that, of course, is a hack – an entire wiki apparatus around a set of data that users can’t edit – and the fact that this hack is in use just indicates the lack of other options currently available. There is no software that says, “give me your data – in any standard format – and I will construct a pleasant display interface around it”. Why not? It should be doable. Data is data, and if we can make reasonable assumptions based on its size and nature, then we can come up with a reasonable way to display it, without requiring a programmer for it.

Bringing in multiple sources

And like SMW and its use of the External Data extension, there’s no reason that the data all has to come from one place. Why can’t one table come from a spreadsheet, and another from a database? Or why can’t the data come from two different databases? If the application can just use its own internal database for the data that it needs, there’s no limit to how many sources it was originally stored in.

And that also goes for public APIs, that provide general information that can enrich the local information one has. There are a large and growing number of general-information APIs, and the biggest one by far is yet to come: Wikidata, which will hold a queriable store of millions of facts. How many database-centered applications could benefit from additional information like the population of a city, the genre of a movie, the flag of a country (for display purposes) and so on? Probably a fair number. And a truly data-neutral application could display all such information seamlessly to the user – so there wouldn’t be any way of knowing that some information originally came from Wikidata as opposed to having been entered by hand by that site’s own creators or users.

Data is data. It shouldn’t be too hard for software to understand that, and it would be nice if it did.

Categories: Uncategorized.

SMWCon coming to New York, March 20-22

If you use Semantic MediaWiki, or are curious about it, I highly recommend going to SMWCon, the twice-yearly conference about Semantic MediaWiki. The next one will be a month and half from now, in New York City – the conference page is here. I will be there, as will Jeroen De Dauw, who will be representing both core SMW developers and the extremely important Wikidata project; as will a host of SMW users from corporations, the US government, startups and academia. There will be a lot of interesting talks, the entrance fee is quite reasonable ($155 for a three-day event), and I’m the local chair, so I can tell you for sure that there will be some great evening events planned. (And the main conference will be at the hacker mecca ITP, which is itself a cool spot to check out if you’ve never been there.) I hope some of you can make it!

Categories: Uncategorized.

Announcing our book…

I am happy to announce a project I’ve been working on for a rather long time now: Working with MediaWiki, a general-purpose guide to MediaWiki. It finally was released officially two days ago. It’s available in print-on-demand (where it numbers roughly 300 pages), e-book (.epub and .mobi formats) and PDF form.

As anyone who knows WikiWorks and our interests might expect, Semantic MediaWiki and its related extensions get a heavy focus in the book: a little less than one third of the book is about the Semantic MediaWiki-based extensions. I think that’s generally a good ratio: anyone who wants to learn about Semantic MediaWiki can get a solid understanding of it, both conceptually and in practice; while people who don’t plan to use it still get a lot of content about just about every aspect of MediaWiki.

This book is, in a sense, an extension of our consulting business, because there’s a lot of information and advice contained there that draws directly on my and others’ experience setting up and improving MediaWiki installations for out clients. There’s a section about enabling multiple languages on the same wiki, for instance, which is a topic I’ve come to know fairly well because that’s a rather common request among clients. The same goes for controlling read- and write-access. Conversely, there is only a little information devoted to extensions that enable chat rooms within wikis, even though there are a fair number of them, because clients have never asked about installing chat stuff.

So having this book is like having access to our consultants, although of course at a lower price and with all the benefits of the written word. (And plenty of illustrations.) And I think it’s a good investment even for organizations that do work with us, to get the standard stuff out of the way so that, when it comes time to do consulting, we can focus on the challenging and unique stuff.

Once again, here is the book site: Working with MediaWiki. I do hope that everyone who’s interested in MediaWiki checks it out – my hope is that it could make using MediaWiki simpler for a lot of people.

Categories: Uncategorized.

Launch of Innovention wiki

Today is the launch date of a new WikiWorks site: Innovention wiki, which “showcases the themes of innovation and invention through stories drawn from South Australia.”

Skinning

We were able to design a really nice skin for the site, based on the specs of their designer. It uses a 3-column layout which is kind of uncharted territory as far as MediaWiki skins go. Part of the challenge here was the right-hand column. The search section is a part of the skin, while the maps, photos and videos are generated by the MediaWiki page itself. This was accomplished by putting that stuff into a div which uses absolute positioning.

Another challenge was trying to fit a decent form into a very narrow middle column. The solution was to hide the right column via CSS, since the search form doesn’t really need to be on a form page. Then, the middle column is stretched to cover both columns. This was easy to do, since Semantic Forms helpfully adds a class to the body tag for any formedit page (which works for existing pages) and MediaWiki adds a tag to any Special page (for when adding a new place with Special:FormEdit). So the content area was accessed with:

.action-formedit #content, .mw-special-FormEdit #content {
   width: (a whole lot);
}

Displaying different stuff to logged in and anonymous users

While on the topic of body attributes, MediaWiki does not add any classes to the body tag which would differentiate logged in from anonymous users. This doesn’t present a problem for the skin, which can easily check if the user is logged in. But what if you wanted to have a part of the MediaWiki content page displayed only for anonymous users? A common example would be exhortations to create an account and/or sign in. That’s something that should be hidden for logged in users. Fortunately, this is easily and cleanly resolved.

Since this was a custom skin, we overrode the Skin class’s handy addToBodyAttributes function (hat tip):

function addToBodyAttributes( $out, $sk, &$bodyAttrs ) {
$bodyClasses = array();

/* Extend the body element by a class that tells whether the user is
logged in or not */
if ( $sk->getUser()->isLoggedin() ) {
   $bodyClasses[] = 'isloggedin';
} else {
   $bodyClasses[] = 'notloggedin';
}

if ( isset( $bodyAttrs['class'] ) && strlen( $bodyAttrs['class'] ) > 0 ) {
   $bodyAttrs['class'] .= ' ' . implode( ' ', $bodyClasses );
} else {
   $bodyAttrs['class'] = implode( ' ', $bodyClasses );
}

return true;
}

For the built in skins, this is still easy to do. Just use the same code with the OutputPageBodyAttributes hook in your LocalSettings.php. This function adds a class to the body tag called either “isloggedin” or “notloggedin.” Then add the following CSS to your MediaWiki:SkinName.css:

.isloggedin .hideifloggedin {
   display:none
}
.notloggedin .hideifnotloggedin {
   display:none
}

Now in your MediaWiki code simply use these two classes to hide information from anonymous or logged in users. For example:

<span class="hideifnotloggedin">You're logged in, mate!</span>
<span class="hideifloggedin">Dude, you should really make an account.</span>

Combine with some nifty login and formedit links

Or even better, here’s a trick to generate links to edit the current page with a form:

<span class="hideifnotloggedin"> [{{fullurl:{{FULLPAGENAMEE}}|action=formedit}} Edit this page]</span>

…and a bonus trick that will log in an anonymous user and THEN bring him to the form edit page:

<span class="hideifloggedin">[{{fullurl:Special:Userlogin|returnto={{FULLPAGENAMEE}}&returntoquery=action=formedit}} Log in and edit this page.]</span>

It doesn’t get much better than that! See it in action here. Yes, you’d have to make an account to really see it work. So take my word for it.

Spam bots

While on the subject of making an account, it seems that bots have gotten way too sophisticated. One of our clients had been using ConfirmEdit with reCAPTCHA and was getting absolutely clobbered by spam. I’ve found that for low traffic wikis, the best and easiest solution is to combine with QuestyCaptcha instead. They’re easily broken by an attacker who is specifically targeting that wiki, but very few wikis have gained that level of prominence. The trick is to ask a question that only a human can answer. I’ve had success with this type of question:

Please write the word, “horsse”, here (leave out the extra “s”): ______

Featured article slideshow

This site has a pretty cool main page. The main contributor to that coolness is the transitioning slideshow with various featured articles. Gone are the days when a wiki only featured one page! This was made possible by bringing the Javascript Slideshow extension up to date, which was done by our founder Yaron Koren in honor of Innovention wiki. The articles are inserted manually which gives the user complete control over the appearance. But it would be pretty simple to generate the featured pages with a Semantic MediaWiki query.

Tag Cloud

Also on the main page is a nifty tag cloud. That is done with Semantic Result Formats and its tag-cloud format (by our own Jeroen De Dauw). Maybe I’ll blog more about that as it develops.

Stay tuned

The site will be developed further over the next few weeks, with some neat stuff to come…

Categories: Uncategorized.

Wikidata begins

I regret to say that our consultant Jeroen De Dauw will not be doing any significant work for WikiWorks for at least the next year. Thankfully, that’s for a very good reason: he’s moved to Berlin to be part of the Wikidata project, which starts tomorrow.

Wikidata is headed by Denny Vrandecic, who, like Jeroen, is a friend and colleague of mine; and its goal is to bring true data to Wikipedia, in part via Semantic MediaWiki. There was a press release about it on Friday that got some significant media attention, including this good summary at TechCrunch.

I’m very excited about the project, as a MediaWiki and SMW developer, as a data enthusiast, and simply as a Wikipedia user. This project quite different from any of the work that I’ve personally been involved with, because Wikipedia is simply a different beast from any standard wiki. There are five challenges that are specific to Wikipedia: it’s massive, it needs to be extremely fast at all times, it’s highly multi-lingual (over 200 languages currently), it requires references for all facts (at least in theory), and it has, at this point, no real top-down structure.

So the approach they will take will be not to tag information within articles themselves, the way it’s done in Semantic MediaWiki, but rather to create a new, separate site: a “Data Commons”, where potentially hundreds of millions of facts (or more?) will be stored, each fact with its own reference. Then, each individual language Wikipedia can make use of those facts within its own infobox template, where that Wikipedia’s community sees fit to use it.

It’s a bold vision, and there will be a lot of work necessary to pull it off, but I have a lot of faith in the abilities of the programmers who are on the team now. Just as importantly, I see the planned outcome of Wikidata as an inevitable one for Wikipedia. Wikipedia has been incrementally evolving from a random collection of articles to a true database since the beginning, and I think this is a natural step along that process.

A set of files were discovered in 2010 that represented the state of Wikipedia after about six weeks of existence, in February 2001. If you look through those pages, you can see nearly total chaos: there’s not even a hint of a unifying structure, or guidelines as to what should constitute a Wikipedia page; over 10% the articles related in some way to the book Atlas Shrugged, presumably added by a devoted fan.

11 years later, there’s structure everywhere: infobox templates dictate the important summary information for any one subject type, reference templates specify how references should be structured, article-tagging templates let users state precisely the areas they think need improvement. There are guidelines for the first sentence, for the introductory paragraphs (ideally, one to four of them, depending on the article’s overall length), for how detailed sections should be, for when one should link to years, and so on. There are also tens of thousands of categories (at least, on the English-language Wikipedia), with guidelines on how to use them, creating a large set of hierarchies for browsing through all the information. These are all, in my eyes, symptoms of a natural progression toward a database-like system. Why is it natural? Because, if a rule makes sense for one article, it probably makes sense for all of them. Of course, that’s not always true, and there can be sub-rules, exceptions, etc.; but still, there’s no use reinventing the wheel for every article.

People complain that the proliferation of rules and guidelines, not to mention categories and templates, drive away new users, who are increasingly afraid to edit articles for fear of doing the wrong thing. And they’re right. But the solution to this problem is not to scale back all these rules, but rather to make the software more aware of the rules, and the overall structure, to prevent users from being able to make a mistake in the first place. That, at heart, was the thinking behind my extension Semantic Forms: if there’s a specific way of creating calls to a specific template, there’s no point requiring each user to create them in that way, when you can just display a form, let the user only enter valid inputs, and have the software take care of the rest.

Now, Wikidata isn’t concerned with the structuring of articles, but only with the data that they contain; but the core philosophy is the same: let the software take care of anything that there’s only one right way to do. If a country has a certain population (at least, according to some source), then there’s no reason that the users of every different language Wikipedia need to independently look up and maintain that information. If every page about a mutiplanetary system already has its information stored semantically, then there’s no reason to separately maintain a hand-generated list of multiplanetary systems. And if, for every page about a musician, there’s already semantic information about their genre, instrument and nationality, then there’s no reason for categories such as “Danish jazz trumpeters“. (And there’s probably much less of a need for categories in general.)

With increased meaning/semantics on one hand, and increased structure on the other, Wikipedia will become more like a database that can be queried, than like a standard encyclopedia. And at that point, the possibilities are endless, as they say. The demand is already there; all that’s missing is the software, and that’s what they’ll be working on in Berlin. Viel Glück!

Categories: Uncategorized.

Dynamically resizing an image

This is something that comes up, particularly when dealing with MediaWiki infoboxes. We had an infobox table floating on the right side of the page with a fixed with. Then there was an image to the left of it that was supposed to take up the remaining page’s width. The challenge was: What happens as the user shrinks or stretches the browser window? The fixed width table would stay the same size but the image would have to grow or shrink with the browser width.

There are some scripts out there that would do this. You don’t need them. Here’s what to do:

{| style="float: right; width: 613px" ... |}
<div id="image-holder" style="margin-right: 630px;">
[[File:Image.jpg]]
</div>

Then add some CSS:

div#image-holder img {
  height: auto;
  width: 100%;
}

Simple, right? Basically, the 100% width combined with the right margin gives us a “100% percent minus x number of pixels” effect. The browser responds accordingly.

See it live here.

Categories: Uncategorized.

New AdManager extension

The folks at the American Academy of Ophthalmology asked us to create an AdManager that could be managed in full from the wiki itself. They were using BanManPro which allows you to create zones, with each zone corresponding to a particular ad. AAO wanted per-page or per-category control of which zone was assigned to which page. We created the AdManager extension but we designed it so it could be used with OpenX or really any ad service. The extension currently sticks the ads in the sidebar. You can place any number of zones there. See it in action here.

I think the way we hooked into the sidebar was neat and the tips I took from it are useful for any other extension that adds to the sidebar. We of course used the SkinBuildSidebar hook. The problem is that monobook automatically puts a header on top of the new sidebar item – in this case AdManager1 – and puts a border around it. Neither of these was desired. Good thing that MediaWiki also gives it an id (p-AdManager1) so you can use some CSS to hide it. The trick here is that there can be multiple zones added so we may have other ids like p-AdManager2 and so on. So here’s the nifty CSS that will match all of those ids:

div[id*='AdManager'] h5 {
	display: none; #Hide the header
}
div[id*='AdManager'] .pBody {
	border: none; #and the border
	padding-left: 0; #override the normal padding
}

Unfortunately, you’ll need admin rights to see the 2 special pages that handle the settings. And that’s that.

Categories: Uncategorized.

Tags: ,

Integrating Mediawiki while retaining your web site’s look and feel

We finished this project quite a few months ago but I never got around to bragging about it. The nice folks at petwellbeing.com already had a web site and wanted to add a wiki that would completely match it.
There are a few interesting points here. We based it on the Vector skin. The sidebar was a natural fit since the rest of the site already had one. The main problem was positioning the (always difficult) vector tabs and the personal links at the top. The good news is that it can be done, with a bit of trial and error.

You can see it in action here.

We then pulled off the same idea for another one of their sites. (Both of these skins have since been revamped by a fellow WikiWorks consultant.)

I think the main lesson for me was how easy it is to integrate MediaWiki with the rest of a site. We’ve all come across well-designed sites with a wiki component. Click the wiki and you’re off into monobook-land with no (apparent) links to help you return. Now what? As it turns out, it doesn’t have to be that way. With a little bit of effort, that nice theme can be applied to MediaWiki.

Categories: Uncategorized.

Tags:

Notes from Wikimania 2011

My wife and I got back a week ago from our trip to Israel, which included the Wikimania conference in Haifa – Wikimania is an annual conference mostly about Wikipedia, that, like the Olympics, moves around the globe every time. Wikimania, in this case, was half the goal and half an excuse to visit Israel – which is where I grew up, and where my extended family lives, but which I hadn’t visited in six years.

First, I’ll say that visiting Israel was an absolutely amazing experience. We managed to see a huge amount of the country – Haifa, Tel Aviv, Jaffa, Jerusalem, the West Bank, etc. We met secular Jews, orthodox Jews, Arabs, Russians, Americans (plus all the really diverse attendees of Wikimania, but that’s a separate story). We visited ancient ruins, high-tech parks, leafy neighborhoods, government offices, and settlements on the edge of a desert. And my extended family could not have been nicer – and the same actualy holds true for almost everyone I met, which really defies the stereotype of rude Israelis. (I was 90% enthused about it and 10% – dare I say? – disappointed.)

With that said, on to the conference. Here are some general observations of mine:

  • I think there was a general view that this was might have been the best-organized, best-run Wikimania ever. The venue was great, the food was amazing, the parties were on-point, the keynote speakers (Yochai Benkler and Joseph Reagle) were well-chosen, and there were lots of nice little touches, like awesome videos at the opening and closing (the closing one must have been prepared in less than a day). The one technical glitch was the lack of enough power outlets – but people managed. I’ve been to four of the seven Wikimanias, and they’ve all been enjoyable and well-run, but this one just seemed to have that something extra. I’m biased, though, because I grew up in Haifa, so maybe I’m the wrong person to ask.
  • I have to admit that I had soured a little on Wikimania beforehand – it’s a great experience for anyone who hasn’t been before, but at some point one can get a little Wikipedia’d out. There are always technical discussions about Mediawiki, but never enough, in my opinion. This one had more than usual, though, and a lot of interest among the attendees, so that was a very positive step. As always, I got to talk to other developers, and as always I got to meet some of them for the first time – like Niklas, two ahead of me on the contributors list.
  • The next Wikimania will be in Washington, D.C. At the end of the conference, the D.C. organizers gave a shambolic presentation that I think had everyone worried. Still, the conference will be in the U.S., which gives them a huge natural advantage in terms of getting both attendees and speakers. I plan to be there, in any case.

And, since we’re a Semantic MediaWiki-related company, some Semantic MediaWiki-related comments:

  • There was one directly SMW-related talk, given by Denny Vrandecic, Daniel Kinzler and me. The talk was mostly about adding SMW to Wikipedia, and not about the software per se. Denny talked about the basic premise of SMW, I went into the details and showed some demos, and then Denny and Daniel talked about the planned upcoming “Wikidata” project, which is meant to supply infobox data to all the different language Wikipedias (and, via RDF, to the world), using SMW as the backend. We had 40 minutes to talk, but we could have easily talked for twice that long – we barely talked about the Semantic Web, didn’t mention projects like DBpedia and Freebase, and the SMW demos were quite minimal. Also, there was a packed room, with about 60 people, and lots of questions and comments at the end. Anyway, you can find more information about the Wikidata proposal here.
  • Wikimedia operations guy Ryan Lane gave a talk about the WMF’s server management, where he mentioned that Semantic MediaWiki was used to store details about their setup. (You can see an explanation of that here.) That was pretty cool.
  • There was a session called “Ask the developers“, where I found out about the MediaWiki style guide, which I hadn’t seen before. It’s awesome, and I’ve actually already modified the error-messages display in Semantic Forms to match what’s in the guide.
  • As at previous Wikimanias, there was a good amount of discussion about making editing easier, both of template calls and of wiki syntax in general. Brion Vibber, the head MediaWiki programmer, talked about it a few times, as did Sue Gardner, the WMF executive director. And Jimmy Wales focused a lot of his talk, which was the final talk of Wikimania, on improving the interface on Wikipedia to make things like requesting page moves easier. That last one is not directly editing-related, but it does tie in to making smarter, more user-friendly interfaces. If anything comes out of either initiative, it will undoubtedly be interesting to the Semantic MediaWiki community.

Categories: Uncategorized.

Why Semantic MediaWiki is better than Sharepoint

We have a new essay up, linked from our main site: Semantic MediaWiki vs. Sharepoint. When people talk about the competitors to Semantic MediaWiki, it’s Microsoft Sharepoint that always comes up first – it’s popular, and for many people the name “Sharepoint” is pretty much synonymous with collaboration software. This essay tries to make our case for why SMW is actually the better tool, in both quality and cost. I put together the essay based on our collective knowledge about Sharepoint, in addition to feedback from people who have used both pieces of software extensively.

Categories: Uncategorized.