sample project

Model energy consumption and predict power peaks

April 01, 2016

We built this project from a data science competition hosted by datascience.net.

Project Goal

The goal of the project is to model the energy consumption of various sites and predict consumption peaks. The original data is from more than 40 French consumer sites. The data in this project has been simplified and contains records for only 3 sites.

How we do this

To build this project we have two different data sources you can have a look at:

  • sites: this dataset contains information about the three consumer sites from which the data was collected
  • data: this dataset contains energy-use records from the different sites. This dataset gives us energy-use records for every 10 minutes.

In that dataset you'll find these features:

  • DATE_LOCAL: date of the record
  • ID01, ID18, ID31: for each site, a column valued 1 if the record if for the location #1, 0 if not
  • consomation: energy consumption (in kW)
  • temperature: temperature (in Celcius)

Explore this sample project

  • Dashboard

    Start by taking a look at the charts we built on the dashboard. You can start seeing differences between the 3 sites :
    - ID01 has a very high energy consumption, all the time, even during nights and weekends. It could be an industrial site that works 24/7.
    - ID18 and ID31 seem to have a consumption peak at night.
    - ID18 has a higher consumption when the temperature is low, ID31 has a higher consumption when the temperature is high. 

    Explore !
  • Flow

    Have a look at the flow to understand how the site data is prepared, then modelized and scored. 

    Explore !
  • Data preparation 1

    The first recipe combines the two datasets (LEFT JOIN) and clean some variables. 

    Explore !
  • Data preparation 2

    Then, the data is split into two datasets: records for the year 2011 and records for the year 2012. We do this because we want to train a model on historic data of 2011 and deploy it on the data from 2012 to make predictions. 

    Explore !
  • Model

    A Random Forest algorithm is trained and used for the prediction. 

    Explore !
  • MAPE results

    Then, an R recipe calculates the MAPE, (Mean Average Percentage Error) score globally and individually for each industries/locations. The model performs better for ID01. this could be expected since this ID has a consumption that is more consistent over time and temperature. 

    Explore !

Ready to enter Dataiku DSS ?

If you never used DSS, it might be worthy to familiarize yourself with DSS concepts in the first place.

Learn the concepts Enter DSS

This sample is already available in your DSS!

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

Can't access DSS from a mobile device

Sorry please try again from a desktop device (Chrome and Firefox support only).

Only Chrome and Firefox are supported

Sorry you seem to use another browser not supported by DSS, please try again from Chrome or Firefox.