Government data can be a treasure trove of the useful, the interesting, and the profitable. However, your government should not hide it like buried treasure – it should be easy to find and once you find it, it should be easy to use. Giving someone a book is nice, but if they can’t read the language it’s written in, it’s not very useful. Information is the same way. Of course, different types of information need to be available in different formats for them to be usable, and we’ll talk a little about that also.
The keys to making data findable lie in its structure and metadata. Okay, that was super-geeky. If you think of a dataset as a spreadsheet, each row as you go down the sheet is an item in the dataset and each column as you go across is something about that item. So if you had a list of emails sent by the President, each row would be a different email, and the columns would be things like date sent, subject of the email, to: field, etc. If the data is well-structured, then you can search for all the emails that were sent to or received from the President’s mom in the first week of being in office. If the structure is inconsistent, then this can be really hard. Just as the structure of the data make information within it findable, the metadata tells you about the structure (so you can use it) and ALSO makes it possible to find the right data set in the first place. Your government produces a LOT of information and the metadata is the data about the data that tells you what the data contains. Easy, right? Look, think of metadata as the cover of the book. It should tell you the title, who wrote it, and give you a brief synopsis on the back cover. Data is the same way – you don’t want to have to guess about the data from reading through the data itself. There is so much context that would be lost this way.
The other factor to consider when making data accessible is to also make it usable. This probably seems obvious to you, but it can actually be quite complicated. For instance, the form, format, quality, readability, accuracy, timeliness, and relevance of the data all impact its usability. Your government should think about who the most likely audience is for the data, and then release it in the most neutral, widely accepted formats possible. While it’s not realistic to expect your government to be able to accurately predict every one that will end up using the data (no offense to your government – these things sometimes find unexpected users), the process of thinking it through will make sure it is usable by the greatest number of people.
A few examples will hopefully help bring this to life: Data released in print or as a pdf can only be read by a human being (such as yourself), so this is not so good for budget numbers and spending data that people will most likely want to analyze. Data with lots of numbers or long lists of things such as emails should be released so that it can be easily opened and used in a spreadsheet program (like MS Excel), or if there’s a lot of data, maybe even imported into a database (like MySQL). Where this gets really fun however is if the data is regularly updated and people will want to have regular access to the new data. In this case, your government should consider setting up a database of its own with a read API that will allow anyone with the technical know-how to build a program that will pull information directly from the government source instead of hosting it separately.
In 2007, a meeting of open data experts laid out 8 principles of open government data. It’s worth a look.