There's a serious problem with the current state of shared data - it is almost completely unusable! Here are some ideas for sharing more effectively.
I often have a question I'd like to answer for which I know data are available. Most recently I wanted to look up the incidence (number of new cases) of various infectious diseases over the last decade. This should be easy - CDC publishes the Morbidity and Mortality Weekly Report of just that. Well, the data are indeed available - put only in PDF. Why even bother with computers? They might as well mail around a printout. If I wanted to actually analyze it, I would first need to enter a decade's worth of data by hand. Ain't nobody got time for that.
I don't mean to pick on CDC. County Health Rankings is an awesome website that aggregates and releases for download public health data from a variety of sources. I'm grateful for that, but the Excel files they release each have multiple sheets, nested headers, merged cells, and extra columns with confidence intervals. It's pretty much impossible to analyze that data in a program other than Excel. To do so, I first have to manually select and reformat the data I want, rename the variables, and then copy/paste it into a new file - which rather defeats the purpose.
There are about eight million other examples that I had to restrain myself from enumerating. The point is that sub-optimal sharing practices make it difficult for researchers (of both the professional and citizen variety) to actually use shared data. The research either a) won't get done because it's too much of a hassle, b) will have errors from manual data entry, c) will take way longer than it should. Possibly all of the above. With that in mind, I came up with some tips to level-up your data sharing.
Learn how to step up your sharing game:
Bookmark these guidelines. Next time you reach for the 'export to PDF' button, or begin to use the change-cell-border feature on Excel, pull this out and remind yourself, 'this is not machine-readable. Nobody will use my data if I release it like this.' Then rejoice that you are awesome for sharing your data, and for doing so in a way that is actually useful. And for that, I thank you.
Read my follow up post: Let's make data a civic right
"Send me your data - PDF is fine," said no one ever
The public health paradox ("When public health works, it's invisible")
Let's make data a civic right
Scholarly impact of open access journals
Six months later, disease detectives still battling fungal meningitis outbreak