Choosing the Perfect Way to Visualize Microbiome Data

Next-generation sequencing has allowed us to study microorganisms in more detail, sequence whole genomes, and increase our understanding of the functionality of many organisms.

It has also allowed us to study microorganisms on a community level, allowing for an understanding of microbial interactions and diversity in different communities.

However, data from these studies has several characteristics that may make simple analysis difficult, namely:

It’s highly dimensional (more features than samples)
You get large amounts of data
Complexity
Sparsity (high number of zeros)
Composition

This article will help you overcome these issues and tell you how to select the best figure for your data, optimize figures for readability and impact, and make them ready for publication.

Choose a free resource to help you move forward

EBOOK

Gene Editing 101 is your guide to understanding, designing, and performing CRISPR experiments, exploring how this revolutionary technology is driving advances across health, diagnostics, agriculture, and energy, and covering how to design gRNA, choose a Cas9 format, screen with CRISPR, use advanced CRISPR approaches, and more.

GET YOUR COPY

DOWNLOAD

Bitesize Bio’s blood collection tube chart explains each tube type, cap color, and essential properties in a clear format, further divided into serum and plasma tubes so you can pick with confidence. Grab your free chart, pin it up, and streamline your blood collection process today.

GET YOUR COPY

Familiarize Yourself with R

Most analyses and visualizations of microbiome data are done in R, an open-source language and environment.

Using visualization tools, we can find potential associations between complex data. Additionally, it improves the readability of the data. For example, we can visualize the differences in alpha diversity, beta diversity, taxonomic classification, etc.

Because you can do powerful data visualization and analysis in R, it’s a good idea to familiarize yourself with it. You can read more about R and how it can help with your research here.

Step 1: Choose the Right Type of Plot for Your Data

Choosing the correct type of figure to represent your data is the first decision you will need to make.

When choosing what type of figure, consider the following:

Does it accurately represent your analysis?
Is it comparable to other studies?
Are you looking at all your samples or comparing groups?
Your type of analysis, such as alpha and beta diversity, etc.

With relative abundance, if you are comparing groups, you could use a bar chart or several pie charts; however, if you are comparing all your samples, a heatmap would be better.

With alpha diversity, a scatterplot would be better if you are comparing samples; however, a box plot would be better if you are comparing groups (or different measures).

With beta diversity, if you are interested in the overall variation between groups of samples, an ordination plot such as a Principal Coordinates Analysis (PCoA) plot would be better.

This is because the reduced dimensionality of ordination plots allows for easier visualization of patterns amongst groups, especially with larger sample sizes.

However, a dendrogram (cluster analysis) or a heatmap may be better for comparing individual samples.

So why is cluster analysis better than ordination plots for individual samples? In ordination plots, samples may overlap, making it difficult for you to identify relationships between samples; conversely, with dendrograms, visualizing how closely “related” sample 1 and sample 2 is easier.

Table 1 below explains the different types of plots and the data they are best at representing.

Table 1. The different types of analysis conducted in microbiome studies and how the data can be visualized.

Aspects of the population we wish to study	Type of analysis	Groups/Samples	Type of plot
Differences in the taxonomic diversity of individual samples	Alpha diversity	All samples	Scatter plot
	Alpha diversity	Groups	Box plot
Differences in the taxonomic diversity between samples	Beta diversity	All samples	Dendrogram
Differences in the taxonomic diversity between samples	Beta diversity	Groups	Ordination plots
Differences in taxonomic distribution	Relative abundance	All samples	Heatmaps
		Groups	Pie charts
			Bubble charts
			Bar graphs
	Differential abundance analysis	Groups	Bar graphs
The core taxa	Define core taxa	Groups and samples	Venn diagrams
Microbial interactions	Network analysis	Groups and samples	Network graphs
Microbial interactions	Correlation analysis	Groups and samples	Correlogram

Figure-specific Considerations

Box Plots

Consider adding jitters (non-overlapping individual data points) to the diagram. This lets the reader see how your samples are distributed. [1]

Bar Charts

Don’t use bar graphs at lower taxonomic levels unless rare taxa are aggregated together.

In most studies at lower taxonomic levels, many species/genera are rare, and plotting all of them overcrowds the graph without providing any valuable information.

Pie Charts

These are useful for groups but can also be used to visualize the global composition (composition of all samples) in the study. [2, 3]

Heatmaps

Heatmaps can be used in conjunction with clustering data, i.e., dendrograms. This allows you to visualize both the relationship between samples and their relative abundance.

Ordination Plots

The choice of ordination method is dependent on several factors, including: [4, 5]

Is the data distribution linear or unimodal?
Are you using a distance matrix?
Would you like to add an environmental variable to the plot?

It’s also a good idea to color the different groups being compared—this allows easy visualization and ensures that overplotting does not occur. Overplotting is when data points overlap. When this happens, you cannot read your data and spot any potential trends.

To learn about ordination plots in more detail, check out GUSTAME for helpful descriptions and illustrations.

Venn Diagrams

Consider that a Venn diagram is inappropriate if you are comparing four groups or more, as they become increasingly difficult to interpret.

You can use an UpSet plot instead, which you can make using the UpSetR package. These diagrams show the intersections between different groups in a matrix layout with bars. [6–8]

So, that’s a quick overview of the plots you can use to visualize microbiome data. Check out the example plots in the following section to see what they look like. Otherwise, click here to skip straight to Step 2.

Example Plots

All images are credited to Tanweer Mahomed.

Differences in Taxonomic Diversity

1. Scatterplot showing alpha diversity at sample level

Choosing the Perfect Way to Visualize Your Microbiome Data

2. Boxplot showing alpha diversity at group level

3. Dendrogram showing beta diversity at sample level

4. Principal coordinate analysis showing beta diversity at group level

Differences in Taxonomic Distribution

5. Heatmap showing the relative abundance at a sample level

6. Bubble plot showing relative abundance at group level

7. Bar graph showing the relative abundance at a group level

8. Pie chart showing the relative abundance at a group level

9. Bar graph showing the differential abundance between groups at an amplicon sequence variant

The Core Taxa

10. UpsetR plot showing the intersection of the ASVs between more than three groups

11. Venn diagram showing the intersection of the ASVs between more than three groups

12. UpsetR plot showing the intersection of the ASVs between three groups

13. Venn diagram showing the intersection of the ASVs between three groups

Microbial Interactions

14. Correlogram showing the correlation between different ASVs

15. Network analysis plot showing the correlations between different ASVs

Step 2: Modifying and Improving Your Figure

Now that you’ve chosen the best type of plot and generated your preliminary figure, it’s time to modify it for clarity and impact. When modifying the figure, do not forget to ensure that:

The figure is easy to interpret by the reader
It is aesthetically pleasing

All other modifications are subordinate to these. But how can we improve our figures? Here are a few ideas.

Add a Title

If you add a title, you provide more information to your reader, which is especially useful when combining multiple graphs in one figure.

Information that’s great to include in the title include:

The type of analysis (e.g., Alpha diversity)
The comparison (e.g., Type 1 vs Type II)

Remember, titles should be informative but not long.

Add Labels

Labels are a great way to direct the reader’s attention to important features of your plot/figure. For example, you can label data outliers, allowing for quick referencing, or you can label controls.

Avoid labeling all samples if you have a large sample number.

Ensure the X- and Y-axes are Labeled Correctly

Firstly, ensure that the axes have the correct information and that the scale used is correct.

Second, change the size and face of the font on the title, axis, or labels. This will ensure that all information is readable.

Third, consider changing the background color of the chart; this makes it easier for your reader to visualize data points (Note: ggplot2 default is grey; consider changing to white instead).

Fourth, consider modifying your legend. Where you put your legend can affect the readability of the figure. For example, if your figure is wide, it may be better to put it at the bottom, whereas if it is long, it may be better to put it on the right.

In some instances, the legend may not provide useful information, and it may be better to remove it, such as in alpha-diversity scatterplots showing all samples.

Split the Graph Into Groups

By splitting (known in ggplot2 as faceting) the graph into groups, you can retain all the same information but make it easier to read and highlight important data.

For example, by splitting relative abundance bar plots into the various phyla, trends show a higher prevalence of certain phyla.

Reorder the Data on the Graph

Sometimes, the default order of data on the graph (alphabetical, numerical, etc.) makes it hard to read, and sometimes it’s better to reorder the data. [9] By default, ggplot2 sorts information alphabetically (for categorical variables). You can, however, reorder it according to:

Median
Number of observations
Data from a different column
User-specific order

For more information on how to do this, check out the r graph gallery.

Change the Colors on the Graph

You can change colors on a graph based on different groups by using the aes() function. This is done by assigning either the fill or color to a category (metadata variable or taxonomic name), e.g., (aes (fill =Phylum)). This will assign each level of the variable a color.

To change from the default ggplot2 option, you can use different scale_fill and scale_colour options. [10]

Scale_fill_manual – define the colors to be used manually
Scale_fill_viridis – from viridis package (this option is color-blind friendly)

These options use a palette to assign colors to the variable.

In What to Consider When Choosing Colors for Data Visualization by Lisa Charlotte Muth, she suggests:

If using more than seven colors, consider another graph type
If you have multiple graphs (e.g., alpha diversity, beta diversity) that use the same categories, use the same color for the same category through the report, poster, or publication [11]
Using contrasts
Use light colors for low values and dark colors for high values
Don’t use gradients for categorical data

Discrete vs. Continuous Color

In a nutshell, the use cases for discrete and continuous colors are as follows.

Discrete

Use with discrete data, i.e., with categorical data
Useful with bar charts, pie charts

Continuous

Use with continuous data, i.e., with numerical data over a given range
Useful with heatmap, correlation graphs

Be careful with certain categorical numeric data, such as disease scoring (categories are 0, 1, 2, 3, and 4).

Also consider that subtle colors work better with bar plots, and bright colors work better with scatterplots. [10]

For more detailed information on color in ggplot2, see the corresponding notes on color scales and legends.

Step 3: Export Your Figure for Publication

Now that you’ve created and optimized your plot, all that’s left to do is ensure that the figure is in the correct format for publication.

Different journals may have different requirements for figure submissions, including:

Font size and type
Figure size
Figure resolution
Type of file that can be uploaded with the submission

R studio’s default “Save As” option saves figures with a resolution of 72 dpi, but the Ggsave () function allows you to specify width, height, and dpi.

Many journals limit the number of figures in a publication; you can circumvent this by creating one large figure with several plots.

You will also need to write a caption for your figure. Try to write one that’s simple and clear, and that explains any abbreviations you’ve introduced in the figure.

How to Visualize Microbiome Data Summarized

From the complexity of microbiomes to the simplicity of visualizations.

Despite the many complications of microbiome data, hopefully, this article has helped you understand which type would best suit your analysis and how to present it. There is only one thing left to do—go and start making amazing figures!

For more ideas and instructions on using R in your research, check out these 10 R packages.

And, for information on different packages available for microbiome analysis in R, check out The best practice for microbiome analysis using R.

References

1. Holtz Y. The Boxplot and its pitfalls. Available at: https://www.data-to-viz.com/caveat/boxplot.html. Accessed 13 March 2024

2. Tyc O, et al. (2020). Variation in bile microbiome by the etiology of cholestatic liver disease. Liver Transpl 26(12):1652–57

3. Carpenter CM, et al. (2021). tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R. BMC Bioinformatics 22(1):41

4. Buttigieg PL and Ramette A (2014). A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol Ecol 90(3):543–50

5. Barnett D (2024). Ordination plots. Available at: https://david-barnett.github.io/microViz/articles/web-only/ordination.html. Accessed March 14 2024

6. Lex A (2021). Visualizing Intersecting Sets. Available at: https://upset.app/. Accessed March 14 2024

7. Doering T, et al. (2021). Towards enhancing coral heat tolerance: a “microbiome transplantation” treatment using inoculations of homogenized coral tissues. Microbiome 9(1):102

8. GlendinningL, et al. (2021). Metagenomic analysis of the cow, sheep, reindeer and red deer rumen. Sci Rep 11(1):1990

9. Holtz Y. (2018). Reorder a variable with ggplot2. Available at: https://r-graph-gallery.com/267-reorder-a-variable-in-ggplot2.html. Accessed March 15 2024

10. Wickham H, Navarro D, and Pedersen TL. (2014). ggplot2: Elegant Graphics for Data Analysis. Accessed April 09 2024

11. Muth LC. What to consider when choosing colors for data visualization. Data vis do’s and don’ts. Available at: https://blog.datawrapper.de/colors/. Accessed April 09 2024

You made it to the end—nice work! If you’re the kind of scientist who likes figuring things out without wasting half a day on trial and error, you’ll love our newsletter. Get 3 quick reads a week, packed with hard-won lab wisdom. Join FREE here.

Tanweer Goolam Mahomed

I have a PhD in Medical Microbiology from the University of Pretoria I am currently a post-doctoral fellow at the Forestry and Agricultural Biotechnology Institute (FABI) based at the University of Pretoria. I use molecular biology and bioinformatics to try and detangle the role that microorganisms play in our lives.

About Us

Marketing

Bitesize Bio Search

Choosing the Perfect Way to Visualize Your Microbiome Data