Science With Data

Intro to programming for Quantitative Biology.

Author

Otho Mantegazza

1 Welcome

During your master you will approach biology from a quantitative point of view.

Some aspects of biological sciences has been quantitative since their start. Genetics for example is been quantitative since the beginning; when Gregor Mendel discovered the fundamental laws of inheritance, combining experimental work with statistiscal testing.

So, chances are that you are not completely new to the quantitative approach. For those who start from zero instead, don’t worry, it’s fun.

2 Two Changes in Mindset

Switching to a quantitative mindset requires you to take two steps at the same time. One step is learning to deal with numbers and data, the other is learning computer programming in order to automate your work on data with programming languages such as Python or R (or Julia, or Javascript). While in theory, one can take the first step without the second one; at the present day it is much more practical to learn about data and programming at the same time.

When you work on data, there are so many steps that you can automate using computer programming, that without it, you would hardly achieve anything useful within reasonable time.

3 Open Source, open Access

Most of computational tools for quantitative disciplines, such as R and Python, have been developed with an open source apporach.

This means that the software that you’ll use is free, and that you are free to:

  • Run the programs.
  • Study how the programs work.
  • Redistribute copies of the program.
  • Modify and improve the program and release it again to the public.

For example, you can access, copy, and propose modification to:

4 Install R and Python

In this master you will focus on learning the programming language Python. Although if later you are going to work in a quantitative and data intensive environment, chances are that you are going to use also R.

The “R or Python” dilemma is a long standing one in data science, those two programming languages are mostly interchangeable, and are both used heavily in data intensive environments.

The best choice is to learn and become proficient in one at least one of the two. When you know one, the other comes easy; like learning spanish when you already speak italian.

4.1 R

The easiest way to install R, is to do it locally.

  1. Download and install R from CRAN.
  2. Downlad and install the Rstudio IDE.

4.2 Python

Python can also be installed locally.

But I personally prefer to run it in an isolated Docker container, so that I can try multiple version without dealing with conflicts.

So, to run python you have many options:

4.2.1 Run python remotely on Google Colab

The easiest way to try python, is to run it remotely on Google’s server, through Google Colab.

Just open a Colab Notebook, and run some code.

4.2.2 Locally

You can install python locally downloading it from it’s main website.

If you use Linux or MacOs, python is already installed on your computer. Open a terminal and type python or python3 to open it.

4.2.3 In a Docker Container

I personally use this option. It requires you to install both the IDE Visual Studio Code and the virtualization platform Docker.

In this way you will run python in a container isolated from your main operating system through Docker and you will connect to it using visual studio code.

This setup is done through a Dev Container It’s easier than it sounds, find an example in this page with video.

To try this setup:

  1. Install Docker.
  2. Install Visual Studio Code.
  3. Install the Remote Container extension to Visual Studio Code.
  4. Download this Github Repo and open it with Visual Studio Code.

5 Let’s start programming

Break the ice and write some code.

  1. Open a script in Rstudio and copy this code in it. Run it in the R console to get some nice data visualization out of it.

  2. Some python code is already in the Github repository that you have downloaded previously. Open it in Visual Studio Code and find the hello.ipynb file, run the code that’s stored in it.

In the first example we run code from a script into the R console, in the second example we run code from a notebook.

6 Flexibility

When you take your first steps into programming and into open source, you’ll find out that there’s a lot of flexibility on how to achieve something, and there no absolute best way to carry on a task. For example, you can evaluate R code in Rstudio, you can evaluate it in your computer terminal, or you can evaluate it in the Visual Studio Code IDE.

Scripts are simply text files, and nothing stops you from turning an R script into a Python script simply its extension from .R to .py by mistake. But of course afterwards the syntax of the code would be wrong.

Tabular data often come in CSV format, which are also basically text files, with only loose rules defining what’s a value, what’s a cell, what’s row and what’s a column. You can open them in any text editor, or you can parse them and load them into an R or Python object, to use them abd manipulate them, as long as you get the encoding rules right.

This flexibility could be disorienting at first. Don’t worry, you’ll get used to it.

7 Learn More

Most programming language have been built with an open source mindset. Making these tools available to everyone has been a priority, but even the best tool is useless, if people don’t know what to do with it.

7.1 Documentation

R, Python, and their best packages, come with extensive documentation. For example, check the Python Package Scikit-Learn; it has a website which details how to use the functions in this package, with a discoursive user guide, technical documentation and extensive examples.

Also the R framework Tidymodels, which cover the same scope as scikit-learn, has extensive documentation.

7.2 Online communities

Stackoverflow has millions of questions with high quality answers, that developers all around the world use and curate every day.

Generally, while programming, google is your friend.

7.3 Books

To approach a new topic, you don’t need only to get question anwsered, but also to discover which question could be asked.

Books are great for that.

In the resources section of this website you can find a collection of open access books that you can consult online freely. Enjoy!