An Easy Way to Start Using R in Your Research

Bitesize Bio Search

Search below to delve into the Bitesize Bio archive. Here, you’ll find over two decades of the best articles, live events, podcasts, and resources, created by real experts and passionate mentors, to help you improve as a bioscientist. Whether you’re looking to learn something new or dig deep into a topic, you’ll find trustworthy, human-crafted content that’s ready to inspire and guide you.

Working with large datasets can be very frustrating and time consuming. If only there were more tools out there to simplify things without needing to invest a PhD’s worth of time to learn how to use them! I am here to tell you that there is a solution, and a free one at that.

If you are working with high-throughput techniques that provide you with large data sets, you might have heard about the R programming language. Maybe you even have some colleagues that use it, but they told you that it is quite complicated and you are too scared to give it a chance.

In this article I will give you some tips to lose the fear and start taking advantage of this extremely useful tool. In following articles we will give you step-by-step instructions for using R to analyze your data.

What is R anyway?

R is a programming language that is widely used for statistics and graphics. It is also starting to become very popular in the biology world due to the Bioconductor project that provides tools based on R for the analysis of biological data.

Choose a free resource to help you move forward

CHEAT SHEET

There are some lab math calculations that almost all molecular biologists need to perform, and often on a regular basis. Make sure you never mess up a solution or miscalculate your DNA concentration again, with our handy lab math cheat sheet. Print it out and stick it on your bench.

GET YOUR COPY

DIGITAL TOOL

Save time and hassle when making up your solutions or calculating your DNA concentration, thanks this lab math calculator spreadsheet. Preloaded with common calculations for molecular biologists, the spreadsheet allows you to easily and quickly get the answers you need.

DOWNLOAD

Why is R so convenient?

It is completely free
It is open-source therefore it is constantly checked by its users (It is so widely used that any bug or error in the program is reported soon)
It is very useful for dealing with large amounts of data because it doesn’t require high computer processing power (Have you ever tried to work with a 20000 raws list with Excel?)
It provides you with high quality graphics.
It has a big community of users so you can easily get support online

Very nice, but how does it help me with my research?

Hundreds of packages are available from the Bioconductor project (). You can for example:

Get a list of strongly regulated genes from your microarrays data
Cluster time-series data
Do a pathway or gene ontology analysis of any list of genes or proteins
Have an idea of which transcription factors might be regulated based on a list of regulated genes
Check the quality of several types of data (sequencing, mass spectrometry, flow cytometry, microarrays…).
Do an automated analysis of high-throughput qPCR data
Create and simulate a mathematical model (Boolean, Bayesian…)
Perform any statistical test with your data (that’s why R was created in the first place)

There have to be some bad things as well…

You can do simple things “easily” but it’s not intuitive. Be patient – it will require a couple of days until you are able to make it work.

Ok you convinced me! How can I start?

If you have the opportunity to take a short introductory course in your University don’t hesitate to do it. They will guide you through the first steps and help you when you get your first error screens (this is normal and part of the fun of starting).

But don’t worry, in case you have no other choice but to start on your own, there are several tools that can help you. My favorite is the R Studio Suite that makes using R much more intuitive and user-friendly. Simple options, like loading a data file, are built into the program so that you can do it with just one click (instead of typing a whole command line).

Moreover, there are several sites with free R tutorials for beginners, such as the www.r-project.org

So now you are ready to show off your “computer programming skills” amongst your colleagues who are still too afraid to try!

Look for our next article that demonstrates the basics of entering and analyzing data in R studio.

You made it to the end—nice work! If you’re the kind of scientist who likes figuring things out without wasting half a day on trial and error, you’ll love our newsletter. Get 3 quick reads a week, packed with hard-won lab wisdom. Join FREE here.

Marisa Fernández-Cachón

Marisa gained a PhD in Molecular Biology & Bioinformatics from Albert-Ludwigs-Universität Freiburg im Breisgau. She is currently a Bioinformatics consultant at Genedata, Basel, Switzerland.

More 'Lab Statistics & Math' articles

Lab Statistics & Math

Excelling With Excel: Analytical Method Validation Using LOD and LOQ
BySandesh Marathe

Don’t be fooled by bad data. Make sure your results are reliable with this quick guide to determining LOQ and LOD in Excel.

Read More Excelling With Excel: Analytical Method Validation Using LOD and LOQ
Lab Statistics & Math

Why You Should Care More About Statistics
BySarah-Jane O'Connor

In this, the first in a series of articles on statistics, I want to set out some of the main reasons why you, as a biologist, should improve your knowledge of statistics. The general consensus is that biologists are not strong when it comes to statistics. There’s nothing in our brains that stops us from…

Read More An Easy Way to Start Using R in Your Research – Introduction
Lab Statistics & Math

Pseudoreplication: Don’t Fall For This Simple Statistical Mistake
BySarah-Jane O'Connor

Now we come to the third part of our trifecta; in the last two posts I have gone over p-values and how they determine significance in null hypothesis testing, and we talked about degrees of freedom and their effect on the p-value. Finally, we come to pseudoreplication: where it can all go terribly wrong. Replication…

Read More An Easy Way to Start Using R in Your Research – Introduction
Lab Statistics & Math

Show Us Your Moves: Making an MSD Plot
ByJeremy Chacon

In the previous article, I showed you how to interpret mean-squared displacement (MSD) and showed four easy things you can learn from an MSD graph at a quick glance. Now let’s turn from analyzing an MSD plot to making one. I am going to use the programming language R to generate simulated data and then…

Read More An Easy Way to Start Using R in Your Research – Introduction
Lab Statistics & Math

Statistics: A Good P-value is Not Enough
ByJudith R. Brouwer

Like many scientists, I don’t consider myself a statistics expert. But I am determined to do things right in my science, and that includes statistics. In my experience, a lot of scientists who are “scared” of statistics fall into the trap of ignoring the existence of anything beyond a t-test. But using the right method…

Read More An Easy Way to Start Using R in Your Research – Introduction
Lab Statistics & Math

Polymerase Incomplete Primer Extension (PIPE) Cloning Method
ByOlwen Reina

PIPE PCR is a ligase-independent, restriction enzyme-free cloning strategy like SLIC (link to my SLIC article), SLiCE and CPEC. The PIPE method eliminates sequence constraints and reduces cloning and site mutagenesis to a single PCR step followed by product treatment. It is fast, cost-effective and highly efficient. The key step is designing the primers; one…

Read More An Easy Way to Start Using R in Your Research – Introduction