Creating charts

Applies to DSS 1.0 and above | February 24, 2017

It is easy to create a variety of useful charts in DSS, as you have already seen in Tutorial: Basics.

Let’s say that the Haiku T-Shirts company wants to understand more about their typical order size —- they know from experience that most customers order a single shirt, but they do occasionally get larger orders. What they don’t know is whether these larger orders constitute a significant portion of their business, and whether certain categories of shirts are more likely to be ordered in larger quantities.

How to create charts

Working with the haiku_shirt_sales data, to create a chart:

  • Select Count of records to the Y axis, then select nb_tshirts and category to the X axis.

Creating a simple bar chart

The resulting chart shows us that 10 equal-width bins loses a lot of information, because all orders of 1-5 shirts are clumped together…

A simple bar chart

So let’s break the display of nb_shirts down into raw values:

  • Click on the nb_tshirts label and select None, use raw values.

  • Create a filter to remove hoodies from the chart.

Changing the default binning and adding a filter to a bar chart

The vast majority of orders were for 1 shirt; from the perspective of number of orders, this is not a significant portion of Haiku T-Shirts’ business. From the scale of the X axis, we can see that at least one person made an order of close to 40 t-shirts, but the total is too small to see on the chart, relative to the number of orders for 1 shirt.

An intermediate bar chart

In order to get a better view of the categories by order size: - Click on the chart type selector and choose Stacked 100%.

  • Drag tshirt_price and total to the Tooltip area. On the total dropdown, select Sum. This adds summary statistics to your tooltips.

Changing the chart type to a Stacked 100% chart to get a different view of the data, and adding a tooltip

Now we can easily see that the proportion of sales by category appears to differ by order size. By hovering over bars in the chart, we can see, for example, that while women’s black T-shirts account for a greater and greater proportion of sales as the order size increases from 1 to 5 shirts, the total value of the orders decreases.

Visualizing how proportion of sales by category changes with order size

Thus, whether these visual differences represent a statistically significant model that the Haiku T-Shirts company can exploit is a question we’ll leave for further analysis, because there is always a next step in data science!

DSS charts are portable. You can download them as an image (PNG) or an Excel document.

Downloading a chart from Dataiku

Which data is used by charts, and where computations take place

There are two places where you can create charts in DSS:

  • in an Analysis (using the Lab)
  • on a Dataset

Both Analyses and Datasets give you control over which data your chart is created with – sampled or complete.

Changing the sampling scheme for the data used to create charts

We strongly recommend that, unless you have a relatively small dataset, you use a sample for building interactive charts in Analyses. This is because an Analysis is intended for exploration and quick visual feedback, and thus always uses the in-memory DSS engine.

When building charts on a Dataset however, you can also use an in-database or in-cluster engine, depending on the location of the original data. Look at the following page for additional information on sampling and engines for charts.