Bioinformatics is perhaps best known as the conglomerate of advanced methods for analyzing genomics data, but it’s also a catchall term for any way of studying biology that involves computer programming. By that definition, it’s everywhere these days! If you’re a bench biologist, you’ve probably been more exposed to bioinformatics than you think. If you’re a structural biologist and you’ve used circular dichroism followed by DICHROWEB to deconvolute your data, or you’re a neuroscientist and mapped a brain using Jupyter Notebook, you’ve already used bioinformatics. In my personal experience, I’ve found that using bioinformatics one way has helped train my brain for other ways to use it because the logic is similar even though the applications can be wildly different. Here are just a few of the diverse ways to use bioinformatics:
There is a known need for computational approaches to make sense of DNA and RNA data, so it makes sense that protein studies would benefit similarly from bioinformatics. You may want to know all the proteins in a cell, or in an organelle, or even just the interactome of a protein with many potential partners. In all these situations you could run mass spectrometry on your samples and get a bunch of masses that correspond to the different proteins in your set.
How can you “read” which protein signatures correspond to the insane number of peaks (which not only are numerous but also interfere with each other!) and are therefore in your sample? First, your mass spectrometer should come with peak-reading software that should separate true peaks from noise (it will likely take some tinkering with the settings to get it to read correctly). Next, a number of programs are available online to match your peaks to a protein sequence database, which is usually one or more of the proteome databases covered by UniProt, listed here. Maxquant searches the spectra themselves against the database, while X! Tandem and PepNovo perform de novo sequencing of your proteins according to the spectra, allowing you to then search the sequences against the database, ideally using an algorithmic search tool such as InsPecT.
Data can be further analyzed and refined with Scaffold, powerful software that can identify proteins faster by high throughput batch processing, classify biological relevance and intracellular locations of detected proteins, and distinguish among isoforms and post-translationally modified forms of a protein. The functionalities, costs, and system requirements of these proteomics software packages and websites for procuring them are listed in Table 1.
|X! Tandem||Sequence proteins de novo from their mass spectra without attempting to match spectra to protein databases||Free||Windows, Linux, macOS||https://www.thegpm.org/ TANDEM/instructions.html|
|PepNovo||Windows, Linux||http://proteomics.ucsd.edu/ Software/PepNovo/, and click ‘Download’|
|InsPecT||Database search tool coordinated to work alongside PepNovo and other proteomics software||Free||Windows, Linux/Unix||http://proteomics.ucsd.edu/ Software/Inspect/, and click ‘Download’|
|Maxquant||Identifies and quantifies proteins in a sample via their mass spectra, giving summary statistics for how likely each match is to be accurate||Free||Windows, Linux||http://www.coxdocs.org/ doku.php?id=maxquant: common:download_and_ installation#download_and_ installation_guide|
|Scaffold||Rapid identification of proteins by high throughput batch processing and deeper analysis into biological relevance||Starts at $5,795 for academic institutions and $6,795 for commercial use||Windows, Linux, macOS||http://www.proteomesoftware. com/products/scaffold/ download/|
Table 1. Proteomics software: the breakdown.
To envision the structure of a molecule – typically a protein or nucleic acid – in 3-D space, structural biologists (myself included) often use visualization software such as Pymol and VMD. Doing this has helped us gain perspective, literally and figuratively, on structures – for instance, seeing how big of a hydrophobic surface a protein has. Many structural biologists take visualization a step further and run simulations of molecular behaviors, such as the conformation of an ion channel moving from “closed” to “open” or a peptide forming a bond with a phospholipid membrane. NAMD (which runs as an add-on to VMD), Amber, and GROMACS are exemplary software packages used to run such molecular dynamics simulations.
While Pymol, VMD/NAMD, and GROMACS are all downloadable free of charge, Amber charges license fees ($500 for academic and other non-profit use and $15,000-$20,000 for industrial use) for the full version, although a basic version called AmberTools is free. All of these molecular dynamics software packages are compatible with macOS, Linux, and Windows 10.
This term, coined in 1992, refers to a distinct branch of bioinformatics that uses big data to map out species, discover new ones, work out a universal taxonomy system, and track transitions of species into and out of endangerment. The Wikipedia page on this field features quite a thorough list of databases and growing projects tabulating biodiversity information discovered around the world, including Catalogue of Life, Biodiversity Heritage Library, and International Plant Names Index.
Citizen Scientist by Mary Ellen Hannibal is a wonderful book that talks about people with and without PhDs voraciously and meticulously exploring our ecosystem to collect information on what species exist, how they survive, whether they’re doing well, whether they or their surroundings are changing, and if the changes might be human-generated. Hannibal introduces us to many projects including iNaturalist, a crowdsourcing platform and mobile app that allows scientists and non-scientists alike to record their biodiversity findings, pool them with others, and even help interpret data from their own homes (e.g., by matching species across projects). Data from iNaturalist can be fed into the more curated databases mentioned above, such as Catalogue of Life.
Macros for Everything!
Ever use an app and wish there were a button for a specific thing you want to do? That’s where macros come in. Macros are basically scripts that add a function to a program you use and allow you to automate repetitive and often laborious tasks. This is especially useful in Excel. Sometimes I really want to select all the charts in a worksheet, so I have to open up Visual Basic and write a few lines of code.
This kind of technique isn’t limited to Microsoft Office. A colleague of mine once wrote scripts to program our spectrofluorometer to take timepoints all by itself. (If only we could make our Western blots block and wash themselves!) For another example, ImageJ, a free and popular program for interpreting images from microscopes, is basically half easy clicks and half macros that take it to the next level. This is all still bioinformatics, as it’s novel computer programming all in the name of biology!
These are just a few of the ways that scientists from diverse fields use bioinformatics. What others can you think of? Leave a comment below with your top uses and software.
Is bioinformatics something you want to do? Find out the skills you need to become a bioinformatician.