This morning the World Health Organization released downloadable, machine-readable data (yay!) of the number of Ebola cases at the county level (available here). This release is particularly special, because it includes both the number of cases according to the situation reports and the counts as reported by the patient database. The patient database (also known as a line list) is usually considered the gold standard for outbreak data. Until this release, the public had no data from the database - the situation reports were the only resource.
The good news is that this data is immensely useful for epidemiologists, modelers, public health responders - pretty much everyone involved with Ebola work. The bad news is the situation reports are apparently fairly unreliable. Ideally the two data sources would match up very closely. This is not the case.
It's disappointing that the sitreps are not what we thought they were. But ultimately I'm really glad that this information is being made available, because now we can adjust accordingly. I think we should commend the Ministries of Health and the WHO for releasing this data - I value open data and transparency very highly, even when it brings some surprising results.
[Graphics below the break]
When it comes to keeping track of Ebola, there are three categories of cases: confirmed, probable, and suspected. A case is confirmed when a biological sample tests positive for the virus. Probable cases have either been seen by a clinician, or have had close contact with a confirmed Ebola patient. Suspected cases have the signs and symptoms of Ebola, but have not been evaluated by a clinician nor had a sample tested.
Keeping accurate records of case counts is important to outbreak control. We need to know how many people are sick in order to plan for how many hospital beds, healthcare workers, PPE kits, etc. will be needed. Without a solid understanding of the disease burden, it's difficult to mount an effective response.
The symptoms of Ebola virus disease are fairly nonspecific, especially in the beginning: fever, headache, muscle aches, malaise. So suspected cases may not have Ebola after all, but may instead be suffering malaria, Lassa Fever, or some other viral illness. However, the World Health Organization cautions that "a substantial proportion of these suspected cases are most probably genuine cases of EVD.".
In order to keep those accurate records, samples from suspected and probable cases must be collected properly, and transported to the laboratories capable of doing the testing. The laboratories must then process the samples, record the results, and return those results to the relevant public health department. Each step is an opportunity for attrition. Sierra Leone and Liberia provide data on the number of cases in each category.
Notice: we are hosting a Computing for Ebola Challenge hackathon the week of Oct 6. To learn more and register to participate, visit HackerLeague.
Contact tracing is a classic public health intervention. It's no easy task even during small outbreaks - people who had physical contact with someone with an infectious disease are called or visited every day by a public health worker. If they develop symptoms, they are isolated to prevent them from spreading the disease further. When a single case of MERS-CoV was imported into the United States, over 500 people were under followup.
With an outbreak as large as Ebola, the number of contacts requiring follow up is dramatic. Guinea, Sierra Leone and Liberia combined have accumulated well over 30,000 contacts, each of which needs to be followed daily for 21 days. Some of those have finished their follow up period, but many thousands have not. (Several counties in Liberia are still without vehicles for contact tracing, and I assume the situation is similar in SL and Guinea. But that's a conversation for another day.)
We now have Guinea data! However it is locked in PDF... in French...in irregular tables, so it is not yet digitized. Fear not, I can offer a preview from the most recent situation report, which was published on Sept 17 for data through Sept 16, 2014.
The very first sentence says, "The resistance continues in Forest Guinea with respect to awareness of health and administrative authorities on the Ebola virus Wamey in the Health District Nzerekore." This is not good. Not good at all. A group of health outreach workers were killed on Sept 19 in Nzerekore, two days after this report was published. Reports suggest that security in that region has been unstable for a while now. Changing human behavior is the keystone of infection control for Ebola, so resistance is a major hurdle to stopping transmission.
There appear to be two regions with active outbreaks (the red counties are 'foyers actifs', the green countries are 'foyers calmes'). A two-front war is not ideal, but I don't know too much about how Guinea is handling the outbreak so I'll leave it at that. The red southern counties border Sierra Leone and Liberia. Nzerekore, the location of resistance, is the red county in the southeast.
If you're just joining the #HackEbola series, check out the introduction, and see the other posts by clicking on the Ebola category. All this data is available at github.com/cmrivers/ebola.
Let's skip over to Sierra Leone for a bit. Although there are fewer cases in Sierra Leone, it's still suffering widespread transmission. Kailahun and Kenema counties in particular are hard hit, with 601 and 438 cases respectively, as of Sept 17. However, these counties are relatively populous. Comparing the number of cases in each county to population reveals that both counties are leading in both in relative and absolute terms.
Read about my motivation for this series here.
The number of Ebola cases is growing. Roughly exponentially, in fact. Cases in Liberia continue to grow exponentially (2,712 as of Sept 17). Sierra Leone is a bit slower, but not slow enough (1,603 cases). Guinea looked like it was getting under control until the last several weeks, when case counts have begun to climb again (861 cases).
The Ebola outbreak in West Africa was first recognized in March of 2014, but it likely crossed over from animals to humans in late 2013. In the intervening months, it has grown to an epidemic of unprecedented severity. It has infected more people than every other known Ebola outbreak combined - around 5,500 recognized cases, as of this writing. This species of Ebola has a case fatality risk of 70-90%, so the number of human lives already lost is truly astounding.
Despite the severity of the current situation, the months to come threaten to be even worth. Exponential growth continues unabated in Liberia in particular, which means that every sick person infects two other people on average. That may not sound like a lot, but tens and hundreds of thousands of cases in the forthcoming months is not out of the question. Sierra Leone and Guinea are experiencing less dramatic growth, but are producing a catastrophic number of cases nonetheless.
What looks like a slow trickle of cases in the beginning is simply the way exponential growth works - one to two to four cases isn't all that many. But when you get into hundreds or thousands of cases, the number of new cases produced is staggering. I've seen many media reports claim that the epidemic is accelerating, but that's not the case. It's simply following the trajectory it has been on all along.
What's significant about that point is that the current state (and the situation to come) were foreseeable and foreseen. Ignoring the outbreak for the first six months was folly, but continuing to ignore it is a "threat to global peace and security"
In an effort to help, I have been digitizing data on the Ebola outbreak in West Africa, which is released by the WHO, the Liberia Ministry of Health, and the Sierra Leone Ministry of Health in PDF format. The data are available for download on my github. I haven't seen any analyses of the county-level data online - the PDF format and the multi-dimensional aspect render it a little more inaccessible than a normal data set. Over the next several days and weeks, I will be analyzing these data and publishing my findings here on this blog. I hope that they are useful for helping people to understand the severity of the outbreak, and perhaps they will even be useful for public health planning and response.
You can follow along by searching the ebola category on my blog, adding me to your RSS feed, or monitoring #hackebola on Twitter. I also encourage you to download the data and develop analyses of your own - tag them with the hashtag, and add them to the analyses/ folder on my github too.
Update: If you are just joining us, know that the series has begun. Click the ebola category to see the installments.
"Send me your data - PDF is fine," said no one ever
The public health paradox ("When public health works, it's invisible")
Let's make data a civic right
Scholarly impact of open access journals
Six months later, disease detectives still battling fungal meningitis outbreak