The IPython Notebook is a favorite tool for many data scientists. It provides the users with an ideal environment for analyzing interactively datasets directly from their web browser, combining code, graphical output, and rich content in a single place.
Because of all these nice features, the IPython Notebook is embedded in Dataiku DSS, and tightly integrated with other components.
There are two main ways for creating an IPython Notebook in Dataiku DSS:
(Note that this functionality is also available from the Actions menu on a Dataset)
This will open automatically a new Notebook with some minimal code pre-filled, allowing to use Dataiku’s API to read your datasets into Python or R structures (such as data frames). For instance:
You can uncomment the portions of code depending on whether you need to load your dataset into a Pandas dataframe, or iterate through the lines of your dataset.
Should you work with R or Python in your Notebook, you will be able to easily load your datasets using Dataiku’s API, whatever initial source or storage system. You can read more about our API’s there.
In Dataiku DSS, Insights are used by data scientists to share their work, notably via the Dashboard. IPython Notebooks can be used to create an Insight in DSS, where you will be able to share your document with other users in a HTML format.
Once your Notebook is created, close it and go back to the Notebooks list. Click on the Notebook you want to use, and under “Actions” in the right panel, click on “Publish”:
Once generated, you will be taken to the Dashboard, and the Notebook is shown as new Insight. Double click on the top bar of the window to open it:
Finally, another very interesting feature is the ability to create an IPython Notebook directly from a trained machine learning Model. Let’s say you have trained a model, then from the caret close the Deploy menu, you can choose to export it an IPython Notebook:
This will create a new Notebook filled with all the settings and the steps required to reproduce your machine learning build from the UI, but directly in Python. Isn’t It Great?
IPython Notebooks are first-class citizen in DSS. They are in the toolbox of most of the data scientists, and they make a great environment for interactively analyzing your datasets using R or Python. If you want to see what next for IPython Notebooks, check the Jupyter project!