Using using R and ggplot2
Peder Rustøen Braadland
Master in biotechnology (NTNU, 2013)
PhD at the Institute for Cancer Research (2020)
Current:
Postdoc at the NoPSC Research Center
Mostly working with bioinformatics, data analysis, biostatistics…
pbraadland
storytelling with dataSources: VG.no, OurWorldInData.org, Metorologisk Insitutt, NRK.no, sofascore.com
| Context | Priority |
|---|---|
| Art | Data should be beautiful |
| Public information | Clear messages, intuitive, simple |
| Scientific paper | Data accuracy, details, statistics |
| Presentations | Clear messages, engage the audience |
Some examples
The work horse of academic figures
The work horse of academic figures
distribution of y-values?Bars typically indicate the mean of some parameter, and we should communicate the uncertainty associated with our estimate.
variability around the meaninferential statistics and hypothesis testingThe box plot overcomes many of the challenges with the bar plot
More reading: https://nightingaledvs.com/i-stopped-using-box-plots-the-aftermath/
Different distributions can give rise to similar-looking box plots
We see that adding individual points gives the reader more context than a box plot alone
Shows distributions as well as the number of samples
Sequential:
Ordered data - low to high
Intuitively, light colors are low, and dark colors are high
Diverging:
Works to illustrate mid values and extreme ends
Typically useful for scaled data
Qualitative:
Suitable for categorical or nominal data
*More on color blindness: blog.datawrapper.de/colorblindness-part2/
Highlighting specific elements can guide the reader
Or use colors to guide our reader’s attention
Emphasizes change over time
Journals typically specify how figures should appear
themes())
ggpubr::ggarrange() or the patchwork package are good choices)
R and ggplot2() and the ggsave() function to meet these requirementstheme() function to all (gg)plots
ggsave() to define:
theme and color palettetheme_publication <- function() {
theme_minimal(base_size = 7, base_family = "IBM Plex Sans", base_line_size = 0.2) +
theme(
axis.title.y = element_text(color = "#222222"),
axis.title.x = element_text(color = "#222222"),
axis.text.y = element_text(color = "#444444"),
axis.text.x = element_text(color = "#444444"),
axis.line.y = element_line(color = "#222222"),
axis.line.x = element_line(color = "#222222"),
axis.ticks.x = element_line(color = "#222222"),
axis.ticks.y = element_line(color = "#222222"),
axis.ticks.length = unit(2, "mm"),
panel.grid.minor = element_blank(),
legend.position = "top",
legend.title = element_blank(),
legend.text = element_text(color = "#444444"),
legend.key.height = unit(2, "mm"),
legend.key.width = unit(2, "mm")
)
}
palette <- c(
cofactors = "#d4e080",
lipids = "#f17367",
nucleotides = "#b3646c",
peptides = "#ff85a5",
PCM = "#e06bdf",
energy = "#71dcca",
carbohydrates = "#66c4ec",
`amino acids` = "#ffd38b"
)ggsave() the plot with dpi = 500 and width = 165 mm
Keep all illustrations in one, large project for easier re-use of vector elements
For a general audience
differencesequential color palette is intuitive
The presentation will be uploaded to:
https://peder.quarto.pub/blog