R versus Python

May 11, 2015

Which one should you learn first?

Between the two superstar statistical languages, R and Python, which one should you choose to learn? In this post, we will give a few guidelines to help you to choose between the two or learn both!

R and Python logos

Python is an all-purpose scripting language before being a statistical language. You can do all types of tasks with Python including:

Recent developments like scikit-learn and pandas have made Python a perfectly good language for data analysis.

Pandas helps you preprocess your data. Scikit-learn provides a neat interface to a large variety of machine learning models (i.e., you can train different models using the same syntax).

Additionally, the documentation of scikit-learn is fantastic. The API is clearly described and there are also additional posts that are more machine learning courses than simple documentation.

Other Python libraries like seaborn, ggplot (actually, a R package initially!) makes it easy to visualize your data. Since Python is an all-purpose language, it is handy to construct a complete pipeline from data preparation through prediction to website integration.

By contrast, R has been specifically designed for statistic and graphics. Tools for preprocessing, producing data graphs, and computing predictions are available in the base installation.

R’s strength resides in the plethora of packages devoted to specific statistical application developed by its large community (Currently, more than 6000 packages are available).

But there is one drawback to this advantage. On one hand, you have rich spectrum of models at your disposal, on the other hand each package has its own authors hence its own interface. Each time you want to use a new model you will have to study a new interface. However, the number of models in R is more comprehensive than in Python and it can be useful to have the two tools at your disposal when doing your analysis.

In a first approach, I would recommend learning Python (see our Getting started with Python post, bearing in mind that a R package could just implement that missing model you need so badly. In this perspective, note that the interactive notebook built-in within DSS (based on iPython notebook) gives you a great interface for working on these two languages. You can even call a R function from Python code!