How Text-Mining Tools Can Improve Your Literature Searches

Before starting any new research project, it’s essential that you have as complete an understanding as possible of the current research literature. Knowing what other people have done will prevent you from duplicating existing work, and will perhaps indicate under-explored niches. If you work in the same subject area over a number of years, you will accrue this knowledge from your own reading and from your colleagues and supervisor. But how can you get up to speed quickly in a new area? Perhaps your 2-hybrid / ChIP / microarray experiment has suggested some possible interacting proteins you know nothing about. Or maybe you’re formulating a new hypothesis and want to find supporting evidence in the literature.

PubMed is most scientists’ first port of call for literature searches, and there are many fine tutorials on this site explaining how to get the most from this tool. However, many people don’t know that there are also several promising text-mining tools that offer more sophisticated text-searching functions, such as semantic searches, and are quite accessible to experimental biologists. These tools analyse the free text of an article using publicly available Medline data and extract relationships between the search terms, index these relationships, and present their results almost instantly.

Here, I’ll highlight three text-mining tools which were developed by the UK’s National Center for Text Mining (NaCTeM) in Manchester. All of these tools have convenient web interfaces, so it’s easy to give them a try and see if they are useful to you.

MEDIE is an intelligent search engine that easily identifies biomedical correlations. You can use this tool to search for relations between biological entities using a subject-verb-object query, for example, ‘myprotein-causes-cancer’ or ‘mygene-regulates-lipid metabolism’. A nice feature of this tool is that results are returned with snippets of text showing the sentences from which the relationships were deduced, so you can assess their relevance very quickly.

Image Larger Volumes with the UltraMicroscope Choros™

From: Miltenyi Biotech

Trust Your Quantification with the DeNovix DS-8X Rapid Eight Channel, 1µL UV-Vis Spectrophotometer

From: DeNovix

Kleio provides enhanced searching functionality by disambiguating alternative names and searching for all synonyms of the search term. For example, a search for ‘interleukin-1’ would also match texts containing the terms ‘IL1’ and ‘IL-1’.

Finally, Facta searches for pairwise associations between related concepts. If you have questions like: ‘What diseases are relevant to a particular gene?’ or ‘What chemical compounds are relevant to a particular disease?’, then this tool may be able to help you. One particularly interesting feature is Facta’s ability to search for indirect associations that would not be immediately obvious from reading individual abstracts. This tool also highlights relevant text from the abstract to provide evidence for the associations it identifies.

Of course, text mining is not perfect yet – the English language is so rich and varied that an idea can be expressed in a myriad of ways, not all of which are captured by the heuristic rules of the text-mining algorithm. But these tools are so easy and fast to use that they can be added to your literature-searching repertoire today!

Have you used any of these tools to search the literature? What other tools do you recommend?