Time series in Python

May 20, 2015

Python and pandas are really efficient to work with time series data. In this article, I’ll show you how to visualize it, how to resample your timeseries, and how to do some moving average calculations.

Load your dataset

# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

dataset = dataiku.Dataset("dataset")
df = dataset.get_dataframe()

# Set your date column as a datetimeIndex:
df.index = pd.DatetimeIndex(pd.to_datetime(df.date_parsed))

Once your index is a date, this allow us to use some cool pandas functions.

Visualize your data.

Pandas handles the DatetimeIndex and plot a graph with the right Xaxis: your datetime.

Time series plot of visits, unique visitors, and page views

Resample your time-series.

The resample function in pandas allows you to specify a key {D:day, W:week, M:month, Y:Year} and a method: “sum”, “mean”, any functions to reshape your dataframe.

Here’s I aggregated my variables by doing the sum of it by 2 weeks:

Times series plot aggregated by 2 week periods

Compute moving average.

Pandas have built-in functions to compute moving average. For instance, ewma function is an exponential smoothing:

Plot of moving averages