I've been working on a new project lately that I'm very excited about. I recently launched Episkills, an initiative to teach epidemiologists to code. Learning to code completely changed how I thought about epidemiology and the future of public health, and I want other people to have access to that perspective. My ultimate goal is for programming to be a core skill taught in all MPH programs. Not just a semester of SAS, but a real foundation in bringing programming into public health practice.
Episkills is an online, self-paced course that teaches Python for data analysis step by step. Once you enroll, you can access the courses forever. You can also download them to execute the code, or save them for future use. I've also embedded a slack chat in all the lessons (slack is like instant messaging) so you can talk to me and ask question as you learn. I hope it will be a resource for students and public health professionals to jump in to the world of coding. Check it out at www.episkills.com or go straight to the course at episkills.teachable.com.
My 25th birthday is upon me, so naturally I want to make sure I'm on track. A 2011 paper in PNAS by Jones & Weinberg on Age Dynamics in Scientific Creativity was quite insightful. They find that in modern times, 60% of Nobel-prize winning research in medicine is conducted by age 40. Fifteen years doesn't seem like much time to dream up, fund, and execute game-changing research, but I remain optimistic. The authors helpfully deposited their data, so I put together a data summary.
Several months ago I heard a fascinating segment on NPR about how astronomers process data by turning it into sound. I thought I knew my data inside and out, but never had I conceptualized it as sound. I was so taken with the idea that I gave it a shot.
The raw data are a record of every person who has been diagnosed with Middle East Respiratory Syndrome coronavirus (MERS), maintained by Andrew Rambaut. About 40% of the records include some identification of cluster membership - in other words, people who may be epidemiologically linked. I used my Python package epipy to algorithmically reconstruct transmission trees from those records. I call them case trees, and you can learn more about them here. Case trees aren't perfect, but they are a plausible approximation of how the disease was spread from person to person.
The next step was to borrow heavily from the example code in cirlab's miditime package. Using the networks generated for the case tree plot, I transformed the outbreak into sound. One outbreak year corresponds to five seconds in the song. Each case is a note; the higher the note on the scale, the further up the transmission tree it is. Days when there are several cases result in multiple notes played at once. You can hear that as the incidence increases in spring 2015, the sound gets quite hectic.
After generating a midi file, I used Garage Band to turn the sound into something resembling music by playing the track with two different instruments. Although the subject is quite grim, the resulting song is a fun 11 second rendition of an outbreak's course.
Today I spent a long time trying to convince LaTeX that I wanted my bibliographies to have section numbers, and to appear in my table of contents. Apparently bibtex and the article class do not allow this option. None of the solutions I found online were working with my multi-file compiled document, until I found a cached copy of a deleted blog post from 2007.
The solution is actually pretty simple:
The original author says to put it in your document preamble, but that didn't work for me. I inserted it just below my first /bibliography (I have one for each chapter), and it carried forward through the rest of the document. You can also change section to subsection, or Bibliography to whatever you want your references to be named. The * function after \section also works.
For reasons unknown, the WHO posted and then heavily redacted their leadership statement on Ebola response and WHO reforms. The original statement is embedded below. The current version is here.
During the summer I began converting Ebola situation reports (sitreps) from PDF to a text format. The reports had critical information like the number of new Ebola cases, how many people were in treatment, details about contract tracing, and more. I needed the data for my research on modeling infectious disease, and since the data were completely unusable in PDF, every day I did the painful manual conversion into a standardized machine-readable format*.
I figured that if I needed that data, other people probably did to, so I pushed each day's sitreps to github. Turns out I was correct on that point, and by October, I found myself maintaining a data repository that received a couple thousand hits each day, and had a cadre of active contributors. Although I expected some people would find it useful, I had no idea it would receive the attention it did. That shortsightedness meant I had not done a lot of the work needed to make the repository maximally useful.
Here is what I would do if I could start over:
In a recent article published in the Journal of Global Health, Jared Jones argues that anthropology has been misused as a tool to "other" people in Ebola-stricken regions. Jones suggests that rather than participating in the characterization of West Africans as "backwards" or "primitive", anthropologists should highlight structural inequalities that facilitate disease transmission, like poverty and poor health systems.
There's a lot I like about the piece, and if nothing else Jones and I share an appreciation for the work of Paul Farmer. I agree, as I suspect would most public health professionals, that poor public health infrastructure is a primary amplifier of outbreaks. Underdeveloped health systems delays outbreak detection, hampers the speed and power of the response, and generally makes achieving control more difficult. I also agree that portraying West African people as ignorant and illogical is nonsense. Historically, even communities without access to or knowledge of what we think of as biomedicine often have sophisticated conceptualizations of infection control.
Where I don't agree with Jones is the assertion that anthropologists should not engage in applied work like "designing education campaigns, explaining the actions of international health teams to locals, and designing “culturally sensitive” intervention strategies." I think this is one of the most powerful tools of anthropology in the midst of an outbreak, and it should not be downplayed in favor of narrating or contextualizing. There's plenty of time for that after an outbreak is over - unlimited time, in fact. In the middle of a crisis, what's needed is action.
As I wrote in a previous post, I believe practitioners (of both anthropology and public health) have an obligation to stand with the communities they serve. Bearing witness is necessary but not sufficient.
By eschewing involvement with the actual outbreak response, anthropologists are missing a huge opportunity to improve public health. Ultimately epidemiology and anthropology (anthrodemiology?) have a lot in common; if they were to work together, so much more could be achieved.
Previous posts on anthropology + epidemiology:
Ebola and poverty
Policy misconduct and the role of advocacy in public health
The public health paradox: Why people don't get flu shots
Many months ago I promised to write about how I organize my scientific literature horde. The process I've developed works really well for me, and it successfully got me through dissertation writing.
I use Mendeley, a great cross-platform reference manager. I like Mendeley because it syncs across desktops, has adequate annotation features, and works well with bibtex. One of my favorite features is the 'watch folder' option, which periodically checks a designated folder for new additions. I set this option to watch an "add_to_mendeley" folder where I store all PDFs the minute I open them. That way everything I read gets saved, and I'm not left trying to track down a reference online that I have only a vague memory of. Another key setting is turning on automating .bib updating, so my .bib file is always my complete library.
When I'm reading a paper in Mendeley, I highlight and add notes as needed. But what I find more helpful is to write a 2-3 line summary of the paper in the Notes section of the Mendeley interface. I also add any thoughts I have about how it relates to my work. I might say "Model structure useful for spillover events," or "Their findings [x] support our previous findings [y]." Sometimes the note is simply "not relevant", but that's still a helpful reminder not to waste my time when I stumble across it again.
When I'm reading a paper, I also make sure to double check that the automatic citation generator got all of the fields correct, and I add a bibtex citation key if there isn't one already. I star papers of particular relevance so I can find them again easily. I don't usually bother sorting my papers into folders, since the Mendeley search feature works really well. If I'm doing a lit review in a new field I do sometimes go to the extra trouble to sort, because I do more browsing of the collection, whereas when I'm working in my subfield I usually know which paper I need. When I'm ready to write, I just add my .bib file to my latex document, and cite away.
I'm happy to report that I successfully defended my PhD this month! I still have some odds and ends to complete, including finishing the two courses I'm taking, but for all intents and purposes I've finished grad school. I've had a wonderful time - something I suspect not many students are able to say. The credit for that goes to my lab (NDSSL) and my awesome advisors, Bryan Lewis and Stephen Eubank.
Next up I'll be moving to the Baltimore area to work for the Army as an epidemiologist, which is part of my SMART scholarship. In the meantime I've got a few months of downtime, so send me an email if you have a project you want to chat about.
This morning the World Health Organization released downloadable, machine-readable data (yay!) of the number of Ebola cases at the county level (available here). This release is particularly special, because it includes both the number of cases according to the situation reports and the counts as reported by the patient database. The patient database (also known as a line list) is usually considered the gold standard for outbreak data. Until this release, the public had no data from the database - the situation reports were the only resource.
The good news is that this data is immensely useful for epidemiologists, modelers, public health responders - pretty much everyone involved with Ebola work. The bad news is the situation reports are apparently fairly unreliable. Ideally the two data sources would match up very closely. This is not the case.
It's disappointing that the sitreps are not what we thought they were. But ultimately I'm really glad that this information is being made available, because now we can adjust accordingly. I think we should commend the Ministries of Health and the WHO for releasing this data - I value open data and transparency very highly, even when it brings some surprising results.
[Graphics below the break]
"Send me your data - PDF is fine," said no one ever
The public health paradox ("When public health works, it's invisible")
Let's make data a civic right
Scholarly impact of open access journals
Six months later, disease detectives still battling fungal meningitis outbreak