Help! Excel Transmogrified My Gene Names!

by

last updated: March 1, 2011

I love that word. Transmogrified. It sounds like something Rob Grant and Doug Naylor, the writers of the sci-fi comedy series, Red Dwarf, would make up. As in, “Kryten and I were transmogrified into another time dimension”. Anyway, enough of 80’s cult TV shows. If you are still with me after the last two articles on using Excel to work your lab results, and you routinely use this business-oriented spreadsheet application for organizing your scientific data, you may be in for a shock. I was! Yes, it happened to me. Excel polymorphed my gene names into the text equivalent of the mutton vindaloo beast. So, if you work with lists of gene names, commonly referred to as the “official gene symbol” within MS Excel, be prepared to check your files. What? How could that happen? Well, MS Excel was designed to be an office application and not a scientific data organizer, so it thinks it is smarter than you are and automatically converts any data that appears to be a calendar date entry, into a calendar date entry. Simply double-clicking the filename in windows or opening a data file from the Excel file menu can cause havoc by automatically transmogrifying your gene symbols (names) into a date format. For example, the gene symbol “Sept1” will auto-magically be converted to 1-Sep without warning. How Rude! That default date format conversion has probably caused more grief to molecular biologists than that dreaded black bulb we used for glass pipettes in first year, or that vacuum suction that pulls only the most precious of samples underneath the lab bench when dropped by accident, and only after weeks of preparation. To see the transmogrification in action, cut and paste “Sept1” into a cell within Excel, or create a test file in csv format with some gene symbols in it like “Sept1, DEC2” etc., try to open it using Excel. The results are not going to be pretty. Once the deed is done, it is not undo-able, forcing you to start over. And if you inadvertently click the save button and overwrite the original file, you and your data are doomed to an eternity of transmogrification. The only remedy I know of is to open Excel first, then import the data, changing the default column data type from general to text along the way. Most irritatingly there is no way to turn this auto-format function off (if you know a trick to permanently switch it off, please let us in on it). Worryingly, many databases containing official gene symbols have been contaminated due to this oversight. Oops! And be careful when using annotation files available from vendors as well… they may have already been transmogrified…! A huge thanks goes out to Barry Zeeberg [1] and friends for pointing this out. Has Excel, or anything else, ever transmogrified your data? And are you a fan of Red Dwarf? [1] Zeeberg, B.R. et al. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5:80 (2004)

Paul is currently working as a Senior Scientist at Qiagen specializing in Bioinformatics.

More 'Organization and Productivity' articles