At Sykdomspulsen we adopt the Tidyverse principles.
Sykdomspulsen’s deliverables (reports and website) for evidence-based decision making during the COVID-19 pandemic, and it is crucial that the code is high quality, and is able to be developed at a fast pace. The Sykdomspulsen team members come from a variety of disciplines, ranging from statistics, epidemiology, ecology, medicine and computer science. We are a small team, and oftentimes it is necessary to take over tasks from another person to make timely deliveries. In this post I will talk about how the Sykdomspulsen team follow the tidyverse principles for fast and high quality task development.
The unifying principles introduced by the tidyverse team serve the purpose of making code easier for the human doing the data analysis tasks. There are four core principles:
Human centered: code should serve the human analysts, especially that R is mostly used by non-programmers. Ease to use is key.
Consistent: code style should be consistent, so that one only needs to learn one style and it applies everywhere.
Composable: a complex task should be decomposed into smaller, independent tasks.
Inclusive: the tidyverse community is welcoming and friendly, which creates a good learning environment.
These four principles do not live without one another. Human is the core of these principles, where developing code that are consistent, composable would make it easier to develop code and troubleshoot. Creating an inclusive community makes every member in it benefit from each other.
Here are some of the deliverables by Sykdomspulsen’s 8 members:
13 R packages (splverse) that help with task management, data cleaning and manipulation, modeling and prediction and reporting,
over 1000 automated daily COVID-19 reports with near real-time analysis and results,
one shiny website with hundreds of regular users (municipal public health officers), covering indicators of interest such as Covid hospitalisation and deaths, influenza, GP consultations on infectious diseases, mortality and mobility.
All these deliverables are generated with R, and are completely free (although we still need to pay for Rstudio workbench and the servers!). Here I will explain how the four tidyverse principles make Sykdomspulsen team succeed as a small data science team with limited time and resources.
Sykdomspulsen team is made of talented people with different skill sets: statistics, programming, web design and domain knowledge in medicine and public health. The level of expertise in R (or programming in general) differ, yet everyone is able to contribute in our deliverables. Task decomposition is crucial: people who can handle the more R-demanding tasks could focus on data cleaning and analysis, while the user interface (UI) tasks can be taken by administrative team members. For example, our coordinator plays a key role in updating the shiny website to communicate important messages with local municipal public health officials.
Needless to say, this also enables an inclusive and diverse team where people can contribute to the team with their own strength.
When we develop code, we keep the code readability in mind and try to write comments as much as we can. Since we want to make it comprehensible to everyone, we use explicit names for the functions and datasets. It can be long sometimes, but we believe that it is more important to let people understand the tasks at a glance instead of checking up abbreviations. We have our own style guide and use it throughout our tasks.
Sykdomspulsen has many deliverables, and they are complex. For example, in each one of the Covid daily reports, there are different outcome indicators (e.g. hospitalization numbers, cases, risk levels, vaccination by different doses and many more), at different geolocations (national, county level, municipality level), for different age groups and for different time (daily or weekly). The national report is 24 pages and around 30 tables and graphs. To navigate one report is already quite some work, try do that for all configurations!
Divide and conquer. Complex tasks can, and should be decomposed into smaller chunks. One obvious benefit is that we can use Airflow to schedule the tasks dedicated to data import, data cleaning, computation, result compilation and graph making. Another benefit is parallelization. When time is limited, this could be very useful!
Following the Tidyverse principles has made the Sykdomspulsen team efficient, organized and high performing. Fundamentally, putting its human member at the core of a team makes all the difference.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/sykdomspulsen-org/sykdomspulsen-org.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Zhang (2022, July 14). Sykdomspulsen: Tidyverse principles at Sykdomspulsen. Retrieved from https://docs.sykdomspulsen.no/posts/2022-07-14-tidyverse/
BibTeX citation
@misc{zhang2022tidyverse, author = {Zhang, Chi}, title = {Sykdomspulsen: Tidyverse principles at Sykdomspulsen}, url = {https://docs.sykdomspulsen.no/posts/2022-07-14-tidyverse/}, year = {2022} }