Next-generation sequencing has allowed us to study microorganisms in more detail, sequence whole genomes, and increase our understanding of the functionality of many organisms.
It has also allowed us to study microorganisms on a community level, allowing for an understanding of microbial interactions and diversity in different communities.
However, data from these studies has several characteristics that may make simple analysis difficult, namely:
- It’s highly dimensional (more features than samples)
- You get large amounts of data
- Complexity
- Sparsity (high number of zeros)
- Composition
This article will help you overcome these issues and tell you how to select the best figure for your data, optimize figures for readability and impact, and make them ready for publication.
Familiarize Yourself with R
Most analyses and visualizations of microbiome data are done in R, an open-source language and environment.
Using visualization tools, we can find potential associations between complex data. Additionally, it improves the readability of the data. For example, we can visualize the differences in alpha diversity, beta diversity, taxonomic classification, etc.
Because you can do powerful data visualization and analysis in R, it’s a good idea to familiarize yourself with it. You can read more about R and how it can help with your research here.
Step 1: Choose the Right Type of Plot for Your Data
Choosing the correct type of figure to represent your data is the first decision you will need to make.
When choosing what type of figure, consider the following:
- Does it accurately represent your analysis?
- Is it comparable to other studies?
- Are you looking at all your samples or comparing groups?
- Your type of analysis, such as alpha and beta diversity, etc.
With relative abundance, if you are comparing groups, you could use a bar chart or several pie charts; however, if you are comparing all your samples, a heatmap would be better.
With alpha diversity, a scatterplot would be better if you are comparing samples; however, a box plot would be better if you are comparing groups (or different measures).
With beta diversity, if you are interested in the overall variation between groups of samples, an ordination plot such as a Principal Coordinates Analysis (PCoA) plot would be better.
This is because the reduced dimensionality of ordination plots allows for easier visualization of patterns amongst groups, especially with larger sample sizes.
However, a dendrogram (cluster analysis) or a heatmap may be better for comparing individual samples.
So why is cluster analysis better than ordination plots for individual samples? In ordination plots, samples may overlap, making it difficult for you to identify relationships between samples; conversely, with dendrograms, visualizing how closely “related” sample 1 and sample 2 is easier.
Table 1 below explains the different types of plots and the data they are best at representing.
Table 1. The different types of analysis conducted in microbiome studies and how the data can be visualized.
Aspects of the population we wish to study | Type of analysis | Groups/Samples | Type of plot |
Differences in the taxonomic diversity of individual samples | Alpha diversity | All samples | |
Groups | |||
Differences in the taxonomic diversity between samples | Beta diversity | All samples | |
Groups | |||
Differences in taxonomic distribution | Relative abundance | All samples | |
Groups | |||
Differential abundance analysis | Groups | ||
The core taxa | Define core taxa | Groups and samples | |
Microbial interactions | Network analysis | Groups and samples | |
Correlation analysis | Groups and samples |
Figure-specific Considerations
Box Plots
Consider adding jitters (non-overlapping individual data points) to the diagram. This lets the reader see how your samples are distributed. [1]
Bar Charts
Don’t use bar graphs at lower taxonomic levels unless rare taxa are aggregated together.
In most studies at lower taxonomic levels, many species/genera are rare, and plotting all of them overcrowds the graph without providing any valuable information.
Pie Charts
These are useful for groups but can also be used to visualize the global composition (composition of all samples) in the study. [2, 3]
Heatmaps
Heatmaps can be used in conjunction with clustering data, i.e., dendrograms. This allows you to visualize both the relationship between samples and their relative abundance.
Ordination Plots
The choice of ordination method is dependent on several factors, including: [4, 5]
- Is the data distribution linear or unimodal?
- Are you using a distance matrix?
- Would you like to add an environmental variable to the plot?
It’s also a good idea to color the different groups being compared—this allows easy visualization and ensures that overplotting does not occur. Overplotting is when data points overlap. When this happens, you cannot read your data and spot any potential trends.
To learn about ordination plots in more detail, check out GUSTAME for helpful descriptions and illustrations.
Venn Diagrams
Consider that a Venn diagram is inappropriate if you are comparing four groups or more, as they become increasingly difficult to interpret.
You can use an UpSet plot instead, which you can make using the UpSetR package. These diagrams show the intersections between different groups in a matrix layout with bars. [6–8]
So, that’s a quick overview of the plots you can use to visualize microbiome data. Check out the example plots in the following section to see what they look like. Otherwise, click here to skip straight to Step 2.
Example Plots
All images are credited to Tanweer Mahomed.
Differences in Taxonomic Diversity
1. Scatterplot showing alpha diversity at sample level
2. Boxplot showing alpha diversity at group level
3. Dendrogram showing beta diversity at sample level
4. Principal coordinate analysis showing beta diversity at group level
Differences in Taxonomic Distribution
5. Heatmap showing the relative abundance at a sample level
6. Bubble plot showing relative abundance at group level
7. Bar graph showing the relative abundance at a group level
8. Pie chart showing the relative abundance at a group level
9. Bar graph showing the differential abundance between groups at an amplicon sequence variant
The Core Taxa
10. UpsetR plot showing the intersection of the ASVs between more than three groups
11. Venn diagram showing the intersection of the ASVs between more than three groups
12. UpsetR plot showing the intersection of the ASVs between three groups
13. Venn diagram showing the intersection of the ASVs between three groups
Microbial Interactions
14. Correlogram showing the correlation between different ASVs
15. Network analysis plot showing the correlations between different ASVs
Step 2: Modifying and Improving Your Figure
Now that you’ve chosen the best type of plot and generated your preliminary figure, it’s time to modify it for clarity and impact. When modifying the figure, do not forget to ensure that:
- The figure is easy to interpret by the reader
- It is aesthetically pleasing
All other modifications are subordinate to these. But how can we improve our figures? Here are a few ideas.
Add a Title
If you add a title, you provide more information to your reader, which is especially useful when combining multiple graphs in one figure.
Information that’s great to include in the title include:
- The type of analysis (e.g., Alpha diversity)
- The comparison (e.g., Type 1 vs Type II)
Remember, titles should be informative but not long.
Add Labels
Labels are a great way to direct the reader’s attention to important features of your plot/figure. For example, you can label data outliers, allowing for quick referencing, or you can label controls.
Avoid labeling all samples if you have a large sample number.
Ensure the X- and Y-axes are Labeled Correctly
Firstly, ensure that the axes have the correct information and that the scale used is correct.
Second, change the size and face of the font on the title, axis, or labels. This will ensure that all information is readable.
Third, consider changing the background color of the chart; this makes it easier for your reader to visualize data points (Note: ggplot2 default is grey; consider changing to white instead).
Fourth, consider modifying your legend. Where you put your legend can affect the readability of the figure. For example, if your figure is wide, it may be better to put it at the bottom, whereas if it is long, it may be better to put it on the right.
In some instances, the legend may not provide useful information, and it may be better to remove it, such as in alpha-diversity scatterplots showing all samples.
Split the Graph Into Groups
By splitting (known in ggplot2 as faceting) the graph into groups, you can retain all the same information but make it easier to read and highlight important data.
For example, by splitting relative abundance bar plots into the various phyla, trends show a higher prevalence of certain phyla.
Reorder the Data on the Graph
Sometimes, the default order of data on the graph (alphabetical, numerical, etc.) makes it hard to read, and sometimes it’s better to reorder the data. [9] By default, ggplot2 sorts information alphabetically (for categorical variables). You can, however, reorder it according to:
- Median
- Number of observations
- Data from a different column
- User-specific order
For more information on how to do this, check out the r graph gallery.
Change the Colors on the Graph
You can change colors on a graph based on different groups by using the aes() function. This is done by assigning either the fill or color to a category (metadata variable or taxonomic name), e.g., (aes (fill =Phylum)). This will assign each level of the variable a color.
To change from the default ggplot2 option, you can use different scale_fill and scale_colour options. [10]
- Scale_fill_manual – define the colors to be used manually
- Scale_fill_viridis – from viridis package (this option is color-blind friendly)
These options use a palette to assign colors to the variable.
In What to Consider When Choosing Colors for Data Visualization by Lisa Charlotte Muth, she suggests:
- If using more than seven colors, consider another graph type
- If you have multiple graphs (e.g., alpha diversity, beta diversity) that use the same categories, use the same color for the same category through the report, poster, or publication [11]
- Using contrasts
- Use light colors for low values and dark colors for high values
- Don’t use gradients for categorical data
Discrete vs. Continuous Color
In a nutshell, the use cases for discrete and continuous colors are as follows.
Discrete
- Use with discrete data, i.e., with categorical data
- Useful with bar charts, pie charts
Continuous
- Use with continuous data, i.e., with numerical data over a given range
- Useful with heatmap, correlation graphs
Be careful with certain categorical numeric data, such as disease scoring (categories are 0, 1, 2, 3, and 4).
Also consider that subtle colors work better with bar plots, and bright colors work better with scatterplots. [10]
For more detailed information on color in ggplot2, see the corresponding notes on color scales and legends.
Step 3: Export Your Figure for Publication
Now that you’ve created and optimized your plot, all that’s left to do is ensure that the figure is in the correct format for publication.
Different journals may have different requirements for figure submissions, including:
- Font size and type
- Figure size
- Figure resolution
- Type of file that can be uploaded with the submission
R studio’s default “Save As” option saves figures with a resolution of 72 dpi, but the Ggsave () function allows you to specify width, height, and dpi.
Many journals limit the number of figures in a publication; you can circumvent this by creating one large figure with several plots.
You will also need to write a caption for your figure. Try to write one that’s simple and clear, and that explains any abbreviations you’ve introduced in the figure.
How to Visualize Microbiome Data Summarized
From the complexity of microbiomes to the simplicity of visualizations.
Despite the many complications of microbiome data, hopefully, this article has helped you understand which type would best suit your analysis and how to present it. There is only one thing left to do—go and start making amazing figures!
For more ideas and instructions on using R in your research, check out these 10 R packages.
And, for information on different packages available for microbiome analysis in R, check out The best practice for microbiome analysis using R.
References
1. Holtz Y. The Boxplot and its pitfalls. Available at: https://www.data-to-viz.com/caveat/boxplot.html. Accessed 13 March 2024
2. Tyc O, et al. (2020). Variation in bile microbiome by the etiology of cholestatic liver disease. Liver Transpl 26(12):1652–57
3. Carpenter CM, et al. (2021). tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R. BMC Bioinformatics 22(1):41
4. Buttigieg PL and Ramette A (2014). A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol Ecol 90(3):543–50
5. Barnett D (2024). Ordination plots. Available at: https://david-barnett.github.io/microViz/articles/web-only/ordination.html. Accessed March 14 2024
6. Lex A (2021). Visualizing Intersecting Sets. Available at: https://upset.app/. Accessed March 14 2024
7. Doering T, et al. (2021). Towards enhancing coral heat tolerance: a “microbiome transplantation” treatment using inoculations of homogenized coral tissues. Microbiome 9(1):102
8. GlendinningL, et al. (2021). Metagenomic analysis of the cow, sheep, reindeer and red deer rumen. Sci Rep 11(1):1990
9. Holtz Y. (2018). Reorder a variable with ggplot2. Available at: https://r-graph-gallery.com/267-reorder-a-variable-in-ggplot2.html. Accessed March 15 2024
10. Wickham H, Navarro D, and Pedersen TL. (2014). ggplot2: Elegant Graphics for Data Analysis. Accessed April 09 2024
11. Muth LC. What to consider when choosing colors for data visualization. Data vis do’s and don’ts. Available at: https://blog.datawrapper.de/colors/. Accessed April 09 2024