New Channels on Bitesize Bio

To help you find information on exactly what you need we're implementing channels, a new way to browse content

Each channel is focused on a specific technique or area and authored/presented by hand-picked authors who are experts in their field. Make sure you don't miss a thing by checking the box below for each channel that interests you.

In return we'll send you one email per month that brings you the latest from your chosen channel(s), along with free members-only content.

Check out our upcoming new channels; Flow Cytometry and Cell Culture, we'll be launching them very soon!

I would like to receive the newsletters for the following channels

Cell Culture
Flow Cytomery
Microscopy & Imaging
Next Generation Sequencing
Writing, Publishing and Presenting
Cloning & Expression


My email address is:

next-gen-sequencing-data

A Painless Way to Extract Data from Raw Text Files

by in Tech Tips
From the Bitesize Bio channel

These days, raw text data sets can be spat out at you from all manner of instrumentation. This automation is efficient, however it can give you a headache when it comes to extracting very specific information from reams of characters and numbers within a raw data file.

Brute force (editing or extracting data from the file by hand) is one option open to you, but this leaves you at the mercy of your own typing accuracy and patience. And it’s not a very efficient way to work.

For data that is compressed into just one column of a spreadsheet, here is an easier option: Use Excel (2007 and later) to split everything apart from text to columns.

In the brief example below, there is a sample number and experimental result for every entry that you need to pull out and copy into your records. Look for a common, delimiting character (in this example it is an underscore,  _  ) and tell Excel to put a separating column line in its place, expanding each entry from one unique cell to seven.

PROJECT#_DATE_INVESTIGATOR_ID#_SAMPLE#_PLATELOCATION#_RESULT

011_05/01/2011_INV3_044_32_A1_496.056

011_05/01/2011_INV1_040_487_A2_25.061

011_05/01/2011_INV2_022_12_A3_51.069

011_05/01/2011_INV5_631_23_A4_552.080

(and on, and on)…

Here is how to do it:

1. Copy and paste the raw text file into your spreadsheet.

2. Select the entire column.

3. Move the ribbon to “Data” and look for the Data Tools section.

4. Click Text to Columns. This begins the Convert Text To Columns Wizard.

5. Select original data type “Delimited – Characters such as commas or tabs separate each field” and click Next.

6. Under Delimiters, uncheck Tab.

7. Check Other and enter the delimiter character (in this example it is an underscore,  _  )  in the text box provided. A data preview window shows how the end result will look. Click Next.

8. You can select column by column to change the data type from general to text, date, or omit a column entirely (if desired).

9. Decide where the data should be placed by selecting a location with Cell Destination.

10. Click Finish.

Data management just got easier. Analysis, graphing, sorting and manipulation based on one or more conditions is effortless. Highlight only the important data and copy it directly to your archives. Isn’t that better than brute force? I think so!

Articles in your inbox

Enter your email to be informed when we publish more articles like this on BsB, and also get access to all of these goodies:

  • Free ebooks and audiobooks on the topics that matter to you
  • Access to Member’s-only articles and Videos
  • Advance notice of new webinars and eBooks
  • Access to make comments and ask questions on BsB



What to read next

Zero Tolerance: A Perfectionist’s Guide to Aseptic Technique

Arguably, molecular biology is impossible without microbiology – even if you work exclusively with transgenic mice, you may one day need to amplify a vector in E. coli. And microbiology is definitely impossible without good aseptic technique. The main principle of good microbiological practice is a zero tolerance approach: it’s good to be a little [...]

The Best of: Tech Tips

Bitesize Bio has gained a lot of new readers over the past few months so I thought it would be a good idea to highlight some of the articles newer readers may have missed. I’ll do this periodically to make sure none of our readers miss any of our great content. So, here are the [...]

How Do YOU Make Sure That Your Western Blots are Evenly Loaded?

For Western blot data to be reliable, it is important that you load known amounts of sample into each lane of the gel.  This is of particular importance if you are doing a quantitative blot, where you really need to be able to compare band intensity in each sample.  In this article, we’ll talk about [...]

PCR Problems? Try an Additive

You’ve tried all the usual stuff, and checked the primer sequences twice, but still can’t get that PCR fragment amplified. It’s time to enter the strange world of PCR additives. Over the years a variety of additives have been shown to enhance PCR reactions in certain situations. Here is a summary of some of the [...]

About the author

Jason Erk

Jason holds a BS in Biochemistry and is a faculty research assistant at an Oregon university. While current research efforts span the fields of behavior, neuroscience, and molecular biology, he manages copious amounts of data and freezers full of...

What do you think?

3 comments

  1. from on

    Great post. Is the proce3ss also applicable to extracting metadata information?

    • from on

      Hi Jack, thanks for the comment.

      I would suspect yes, although it would depend entirely on how the data is structured. It might require several rounds of splitting text to columns (using a different delimiting character each time), splitting the data based on a delimiter that is several characters long (i.e. >< ), or feeding the data file through a custom Perl script to try and do everything at once. If you get the chance to try this with metadata, we would be interested in hearing how it works for you.

  2. from on

    I use MS Word a lot for this, because it gives you so much more control over where to separate one column from the next. The trick is to substitute any text that’s in-between two cells with ^t (Word will add a tab), and then copy and paste all to Excel. Excel will use the tab automatically as cell separator, copying all neatly.

Subscribe to Channels

To receive information about any of our new channels click on the button below.
subscribe to the channel newsletter »

Write for us

Have a short tip, a written
article or a video you'd like
to see published?
write for us »