This sample project is based on data from a Kaggle challenge.
Many retail businesses need accurate forecasting of the revenue produced by each of their stores. These forecasts allow for planning, staffing optimization, as well as sure that each store has the necessary supply. Without these forecasts, businesses may waste money by overstocking a store, or worse yet, lose out on revenue because a store does not have enough supplies to handle predicted revenue.
In this project, we use historical data from the Rossman pharmacy chain to build a predictive model to forecast the revenue of each of their stores. This model can be run weekly or monthly and provide business actors with accurate predictions about the revenue for coming days or weeks. This information can then be used to optimize business practices and streamline operations.
We want to build a project to answer the following questions:
- What is the expected revenue for each store on each day?
- What factors influence the revenue of a store most?
How do we do this?
We start with 2 different data sources:
- two datasets with the revenue per store per day, split between our historical data (used to train the model), and our forecasting data (used to deploy our model)
- a dataset with information about each store.
Like many data projects, we then proceed with three steps:
- Data Cleaning: we clean our data and build our features
- Predictive Modeling: we build and deploy a predictive model
- Visualization: we create a useful visualization of our predicted data
Let’s go through each one of those steps in more detail to see what we did.