Skip to content

Teach Yourself Python: A Guide for Biologists

Posted in: Software and Online Tools
Image of two people at computer with computer code.

Programming can be incredibly helpful to biologists and is becoming a more and more important skill for researchers to have. However, we know that it can be confusing and scary to start self-teaching. If you want to learn Python at your own pace, then you are in the right place. This article aims to provide all the information needed to kick start your Python self-learning journey. In no time, you are going to be a faster, more efficient researcher with a CV no one will look away from.

Why Should I Learn to Program?

Are you tired of repeatedly doing the same calculations for your data? Do you ever feel like the literature search is never-ending? Once you become a programming biologist you can easily automate all of these (and other) boring, mundane, time-sucking tasks and focus on other things! That is what motivated me to learn to program.

Ok, So Why Should I Learn Python?

With several coding languages out there, it can be daunting to figure out which is the right one for your purposes. I debated for a while between Perl, R, and Python. After learning the basics of all three, I went with Python for the following reasons:

  • Most widely used.
  • Vast community support.
  • Easy learning curve.
  • Consistent and readable syntax.

Python is particularly well suited to researchers because several biology programmers have already contributed many libraries to make Python science-friendly. Python documentation also has a section dedicated to its scientific audience. Here are some more reasons why Python could be your best choice of programming language for biology research:

  • Widely used in the scientific community.
  • Well-built libraries for complex scientific problems.
  • Compatible with other existing tools.
  • Easy manipulation of sequences like DNA, RNA, amino acids.
  • Easy data manipulation and visualization.

It is worth mentioning that there are two Python versions: Python 2 and Python 3. Python 3 is still actively developed, with new features added regularly. It would, therefore, be better if new learners focus on learning Python 3.

Applications of Python in Biology

If NGS is your platform, then check out these earlier Bitesize Bio pieces on choosing the right language for NGS and choosing the right scripting language for NGS. But programming doesn’t just end at NGS; it can be used for many more tasks like literature searches, manipulating DNA and protein sequences, and data analysis and visualization.

Biopython is an open-source library made for computation in bioinformatics. PyMed is another library that can help researchers make consistent and readable batch search queries in PubMed, making literature searches a breeze. [1] Python, with its libraries, is a powerful tool that can manipulate, explore, and visualize complex data sets. Pandas [2] is used for data manipulation and seaborn [3] for data visualization. With Python, storing, organizing, analyzing, and displaying the tons of data you accumulate will no longer be scary.

Let’s Get Learning!

  1. Download Python [4] based on your laptop (MacOSX/Windows).
  2. Codeacademy [5] has an interactive course that will get you started with the basics.
  3. You don’t have to remember every syntax. IDEs (Interactive Development Environments) like PyCharm [6] give suggestions and autocompletes as you type the code. They also have an option to understand (debug) your code step by step, which is an essential part of learning.
  4. Automate the Boring Stuff with Python – Practical Programming for Total Beginners [7] is my favorite. This gave me an in-depth understanding of everything that I needed to know in Python.
  5. I found Biology meets programming: bioinformatics for beginners on Coursera [8] very informative. This course got me up and running into easily manipulating DNA and amino acid sequences.
  6. Once you reach this step, the sky’s the limit. You can practice your skills in websites like Rosalind [9] – for bioinformatics, Programming for Biologists [10] and Python for Biologists [11] for biology-based coding exercises.
  7. Have a question? You can post or find anything in StackOverflow, which has a vast support community.
  8. I also recommend checking out the official Python documentation once you get your basics strong.

Should I Learn R as Well as Python?

R is widely used for statistical computing and graphics; it produces publication-quality plots with mathematical symbols and formulae. However, Python is used for general purposes, so it is still the most dynamic and versatile programming language for researchers. Along with its built-in libraries specific to the scientific community, Python could be handy for a biologist’s day-to-day needs.

Python has changed biology for me and made even tedious things quite interesting. Is Python going to be your first coding language, or do you already have other coding languages in your pocket? Share your coding learning path and tips in the comments below!

References

  1. Wobben G. pymed: Python library for access to PubMed. Accessed August 05, 2020.
  2. Pandas. Accessed August 10, 2020.
  3. Seaborn. Accessed August 10, 2020.
  4. Python.org. Download Python. Accessed July 29, 2020.
  5. Codecademy. Learn Python 3. Accessed July 27, 2020.
  6. JetBrains. PyCharm: the Python IDE for Professional Developers by JetBrains. Accessed July 27, 2020.
  7. Sweigart A. Automate the boring stuff with Python: practical programming for total beginners. San Francisco: No Starch Press; 2015.
  8. Coursera. Biology Meets Programming: Bioinformatics for Beginners. Accessed July 29, 2020.
  9. ROSALIND | Problems | Locations. Accessed July 27, 2020.
  10. Exercises · Programming for Biologists. Accessed July 29, 2020.
  11. Exercise files. Python for Biologists. Accessed July 29, 2020.
Share this to your network:

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top