These days, raw text data sets can be spat out at you from all manner of instrumentation. This automation is efficient, however it can give you a headache when it comes to extracting very specific information from reams of characters and numbers within a raw data file.
Brute force (editing or extracting data from the file by hand) is one option open to you, but this leaves you at the mercy of your own typing accuracy and patience. And it’s not a very efficient way to work.
For data that is compressed into just one column of a spreadsheet, here is an easier option: Use Excel (2007 and later) to split everything apart from text to columns.
In the brief example below, there is a sample number and experimental result for every entry that you need to pull out and copy into your records. Look for a common, delimiting character (in this example it is an underscore, _ ) and tell Excel to put a separating column line in its place, expanding each entry from one unique cell to seven.
PROJECT#_DATE_INVESTIGATOR_ID#_SAMPLE#_PLATELOCATION#_RESULT
011_05/01/2011_INV3_044_32_A1_496.056
011_05/01/2011_INV1_040_487_A2_25.061
011_05/01/2011_INV2_022_12_A3_51.069
011_05/01/2011_INV5_631_23_A4_552.080
(and on, and on)…
Here is how to do it:
1. Copy and paste the raw text file into your spreadsheet.
2. Select the entire column.
3. Move the ribbon to “Data” and look for the Data Tools section.
4. Click Text to Columns. This begins the Convert Text To Columns Wizard.
5. Select original data type “Delimited – Characters such as commas or tabs separate each field” and click Next.
6. Under Delimiters, uncheck Tab.
7. Check Other and enter the delimiter character (in this example it is an underscore, _ ) in the text box provided. A data preview window shows how the end result will look. Click Next.
8. You can select column by column to change the data type from general to text, date, or omit a column entirely (if desired).
9. Decide where the data should be placed by selecting a location with Cell Destination.
10. Click Finish.
Data management just got easier. Analysis, graphing, sorting and manipulation based on one or more conditions is effortless. Highlight only the important data and copy it directly to your archives. Isn’t that better than brute force? I think so!
Great post. Is the proce3ss also applicable to extracting metadata information?
Hi Jack, thanks for the comment.
I would suspect yes, although it would depend entirely on how the data is structured. It might require several rounds of splitting text to columns (using a different delimiting character each time), splitting the data based on a delimiter that is several characters long (i.e. >< ), or feeding the data file through a custom Perl script to try and do everything at once. If you get the chance to try this with metadata, we would be interested in hearing how it works for you.
I use MS Word a lot for this, because it gives you so much more control over where to separate one column from the next. The trick is to substitute any text that’s in-between two cells with ^t (Word will add a tab), and then copy and paste all to Excel. Excel will use the tab automatically as cell separator, copying all neatly.