Skip to content

Help! Excel Transmogrified My Gene Names!

Posted in: Organization and Productivity
Help! Excel Transmogrified My Gene Names!

I love that word. Transmogrified.

It sounds like something Rob Grant and Doug Naylor, the writers of the sci-fi comedy series, Red Dwarf, would make up. As in, “Kryten and I were transmogrified into another time dimension”.

Anyway, enough of 80’s cult TV shows. If you are still with me after the last two articles on using Excel to work your lab results, and you routinely use this business-oriented spreadsheet application for organizing your scientific data, you may be in for a shock. I was!

Yes, it happened to me. Excel polymorphed my gene names into the text equivalent of the mutton vindaloo beast.

So, if you work with lists of gene names, commonly referred to as the “official gene symbol” within MS Excel, be prepared to check your files.

What? How could that happen?

Well, MS Excel was designed to be an office application and not a scientific data organizer, so it thinks it is smarter than you are and automatically converts any data that appears to be a calendar date entry, into a calendar date entry.

Simply double-clicking the filename in windows or opening a data file from the Excel file menu can cause havoc by automatically transmogrifying your gene symbols (names) into a date format.

For example, the gene symbol “Sept1” will auto-magically be converted to 1-Sep without warning.

How Rude!

That default date format conversion has probably caused more grief to molecular biologists than that dreaded black bulb we used for glass pipettes in first year, or that vacuum suction that pulls only the most precious of samples underneath the lab bench when dropped by accident, and only after weeks of preparation.

To see the transmogrification in action, cut and paste “Sept1” into a cell within Excel, or create a test file in csv format with some gene symbols in it like “Sept1, DEC2” etc., try to open it using Excel. The results are not going to be pretty.

Once the deed is done, it is not undo-able, forcing you to start over. And if you inadvertently click the save button and overwrite the original file, you and your data are doomed to an eternity of transmogrification.

The only remedy I know of is to open Excel first, then import the data, changing the default column data type from general to text along the way. Most irritatingly there is no way to turn this auto-format function off (if you know a trick to permanently switch it off, please let us in on it).

Worryingly, many databases containing official gene symbols have been contaminated due to this oversight. Oops! And be careful when using annotation files available from vendors as well… they may have already been transmogrified…!

A huge thanks goes out to Barry Zeeberg [1] and friends for pointing this out.

Has Excel, or anything else, ever transmogrified your data?
And are you a fan of Red Dwarf?

[1] Zeeberg, B.R. et al.
Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics.
BMC Bioinformatics 5:80 (2004)

Share this to your network:

4 Comments

  1. Nick on February 24, 2009 at 3:49 pm

    Thanks Shannon!

    If anyone wants to get clued up on the origins of Transmogrification, check out the Calvin and Hobbes Wiki…

    https://calvinandhobbes.wikia.com/wiki/Transmogrifier

  2. Guy on February 24, 2009 at 3:21 pm

    (1) MS Exhell also helpfully eats leading zeros from strain designations, etc. … and is known to almost randomly do odd things with quotes (inserting them or dropping them). If only there was an alternative that did what it CAN do!

    (2) shannon: I too originally thought that Bill Watterson had invented transmogrification (a Transmogrifier has long been on my equipment wish list, along with an Interocitor and a DNA Extrapolator or Interpolator). But a quick check in the OED revealed that the word dates back to 1656! Calvin and Hobbes was such an educational experience.

  3. shannon on February 24, 2009 at 2:10 pm

    Transmogrification was invented by Bill Watterson, through Calvin and Hobbes. I remember its attribution to an abandoned cardboard carton that had once held a large appliance…

  4. Nick on February 24, 2009 at 6:33 am

    “MS Excel was designed to be an office application and not a scientific data organizer, so it thinks it is smarter than you are”

    — you’ve got to love microsoft.

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top