sample project

Use visual data preparation to clean a dataset

April 01, 2016

This project is based on a fictional dataset generated built by Robert Dempsey. You should absolutely read the blogpost he wrote to describe his project.

Business Goal

We are going to use the visual preparation recipes in DSS to rework and clean a list of contacts we have so we can make something of it.

We want to:

  1. clean the names of the contacts and separate first names and last names
  2. clean phone numbers
  3. clean postal addresses.

Explore this sample project

  • Flow

    Take a look at the data pipeline (the flow) to see the successive cleaning recipes.
    Note that even though in this project we seperated all of the steps to make the porject readable, you can put them all in a single recipe to increase performance. 

    Explore !
  • Preparation recipe

    Look at the recipes to see precisely how to clean the addresses. 

    Explore !

Ready to enter Dataiku DSS ?

If you never used DSS, it might be worthy to familiarize yourself with DSS concepts in the first place.

Learn the concepts Enter DSS

This sample is already available in your DSS!

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

Can't access DSS from a mobile device

Sorry please try again from a desktop device (Chrome and Firefox support only).

Only Chrome and Firefox are supported

Sorry you seem to use another browser not supported by DSS, please try again from Chrome or Firefox.