So, you come from a non-coding background, but given the rapid growth of your research results, you have accrued a pile of data that needs data mining.
Now, you can leave this to your bioinformatics core facility, but if you’re curious to understand how they arrive at their conclusions (or you don’t have a BI core facility), you might want/need to learn some code yourself.
Bitesize Bio has published helpful articles to get you started with learning the currently most popular programming language for data analysis in life science, R. These are very good for grasping the main ideas, and provide useful pieces of code that you can copy to the console and perform statistical analysis. Usually, people suggest adding more knowledge by using the R help, but in my experience, R help really helps only if you are familiar with coding terminology. And this requires some training through practical assignments.
Here, I present to you a short review of several online courses on programming in R. In my experience, this should launch you from the very start to the level where you can actually understand R help and use it without having to google 60% of what’s written in it.
At Code school, the place where many people have their first encounter with code, you can also find a short course on R. Their motto is learn by doing, so Try R offers an interactive, gaming-like environment, in which you actually need to type code all the time and smoothly teaches you the basics of the language.
Level: The utter – I’ve never seen a console in my life – beginner
Outline: You have 8 levels to complete, each of which has a different number of challenges (from 9 to 39). After completing a level, you receive a badge, which makes it feel more like you are playing a game. The topics that are covered include basic data structures (vectors, matrices, factors, data frames), but it also introduces you to statistical testing using real-world data. It also explains complex concepts like appending files, by which the learner becomes operable more quickly.
Timeline: If you are not interrupted, one hour will suffice.
A tip from the user: Go through it twice with a pause of one week. We learn better if we repeat things more often, but we rarely have time to review the lessons. Given that this course is not so time consuming, use the opportunity to review what you’ve learned.
Level: Beginner (but with an idea of what R is and how it’s used)
Outline: The course is divided into 8 modules; each module has several lectures followed by lab practices. The labs come after each lecture, and they don’t let you go to the next step without completing the current level. Specifically, the course teaches you about the basic data structures (vectors, matrices, factors, data frames) and the operations with them. You can learn the basic syntax and data handling. The final module gives an overview of graphical capabilities.
Timeline: It is self-paced, which takes off the tension of deadlines. Still, you can finish it in two days, or a one focused 8-hour workday.
A tip from the user: At the middle of the modules, try a different approach from the recommended: see if you can listen to the lectures of the module first and then do the lab exercise. Might be more difficult but the knowledge becomes functional faster.
Coursera is another MOOC platform, which, among the Data Science Specialization, offers a quite extensive and highly comprehensive course on R. Although it requires deep focus, it really does the job – you are fluent in R upon completion.
Level: First week is good for beginners; other three require good understanding of programming concepts
Outline: There are several video lectures every week, and a quiz which you can take to check your knowledge. Programming assignments start after the second week, and require commitment. There are no exercises, but you can install an interactive learning package, swirl, which explains the main commands through exercises. The first week starts with info on the history of R, covers all the basic objects and data handling. From the second week it gets more demanding, focusing on using and writing functions as well as defining some complex concepts like scoping rules. The third week introduces loops and the split-combine-apply strategy, and the fourth week teaches how to polish your code.
Timeline: Over the course of four weeks, about 10 hours per week is necessary. There are deadlines for both quizzes and programming assignments, which give an extra push to the learning process.
A tip from the user: You can listen to the first two weeks of the course, complete the quizzes and swirl exercises, and then take some time to grasp and apply what you have learned thus far. You can come back to the course in a month to finish the rest.
And some more advice
Try to finish all three courses since many of them repeat the same concepts but from different perspectives. This gives you a broader image on what coding in R means and how to reset your brain to think in programming mode. Start with the fastest, Try R, then move on to the first two weeks of R Programming, and while you’re still puzzled at how to solve their Programming Assignments, go through Intro in R to help you straighten out. Then go finish R programming.
And don’t forget to brag around that you learned to code.
Bisulfite pyrosequencing is becoming a routine technique in molecular biology labs as a method to precisely measure DNA methylation levels right down to the single base. The technique allows for detailed and high resolution analysis of DNA methylation at specific genomic regions. How to detect the 5th base? Methylation of any of the four nucleotides […]
It’s great to have you in the Bitesize Bio family! We’ve sent you an email to confirm your registration. Please click on the link in the email or paste it into your browser to finalize your registration.
For more information on how to use Bitesize Bio, take a look at the following image (click it, for a larger version)
An error occured while registering you, please reload the page and try again