howto

Using the IPython/Jupyter Notebook in DSS

August 20, 2015

The IPython Notebook is a favorite tool for many data scientists. It provides the users with an ideal environment for analyzing interactively datasets directly from their web browser, combining code, graphical output, and rich content in a single place.

Because of all these nice features, the IPython Notebook is embedded in Data Science Studio, and tightly integrated with other components.

Working with IPython Notebooks

There are two main ways for creating an IPython Notebook in Data Science Studio:

  • if you want to create a completely "blank" Notebook, just navigate to the Notebook section from the top nav bar, and hit the "New notebook" button. You'll have the choice between creating a regular Notebook using a Python kernel, or creating a Notebook based on a R kernel.

  • very often, while building your data science workflow, you need to explore new datasets. If you want to use a Notebook in this case, from the Flow screen, click on the dataset you want to analyze, then from the right panel, click on the Python or R icon, and select "Notebook".

(Note that this functionality is also available from the Actions menu on a Dataset)

This will open automatically a new Notebook with some minimal code pre-filled, allowing to use Dataiku's API to read your datasets into Python or R structures (such as data frames). For instance:

You can uncomment the portions of code depending on whether you need to load your dataset into a Pandas dataframe, or iterate through the lines of your dataset.

Should you work with R or Python in your Notebook, you will be able to easily load your datasets using Dataiku's API, whatever initial source or storage system. You can read more about our API's there.

Creating an Insight from an IPython Notebook

In Data Science Studio, Insights are used by data scientists to share their work, notably via the Dashboard. IPython Notebooks can be used to create an Insight in DSS, where you will be able to share your document with other users in a HTML format.

Once your Notebook is created, close it and go back to the Notebooks list. Click on the Notebook you want to use, and under "Actions" in the right panel, click on "Publish":

Once generated, you will be taken to the Dashboard, and the Notebook is shown as new Insight. Double click on the top bar of the window to open it:

Generating a Notebook from a Model

Finally, another very interesting feature is the ability to create an IPython Notebook directly from a trained machine learning Model. Let's say you have trained a model, then from the caret close the Deploy menu, you can choose to export it an IPython Notebook:

This will create a new Notebook filled with all the settings and the steps required to reproduce your machine learning build from the UI, but directly in Python. Isn't It Great?

Conclusion

IPython Notebooks are first-class citizen in DSS. They are in the toolbox of most of the data scientists, and they make a great environment for interactively analyzing your datasets using R or Python. If you want to see what next for IPython Notebooks, check the Jupyter project!