Let’s get rid of it, bro
In order to keep costs within budget and to realize feasible results, it is necessary to specifically define the project goal. For this example, our goal is to score the likelihood of patient no-shows in real-time. The scoring would be used to identify high-risk patients and schedule the best time slots for them in order to decrease the likelihood of subsequent no-shows.
In order to create an algorithm, the predictive analytics solution needs to work with data. If possible, provide 3 months’ worth of historical show/no-show data; if not possible, you may need to collect this data for 3 months before beginning the predictive modeling process.
Next, we need to determine the datasets that will be used to establish patient scoring. In other words, the factors that will determine whether or not a patient is likely to appear for a given time slot. Some possibilities include:
Just like Sherlock, ask yourself the right questions
Some key questions to answer: how frequently are these datasets updated? Are they automated? Is accurate and up-to-date data available?
It is common for datasets to be available in different formats (xls, calendar files…), so one of the challenges of data collection will be shaping them all in a common processing-friendly format.
Datasets cleaning, cobblestones polishing: a boring life
The process of building a predictive model involves a series of normalization and optimization steps designed to determine model accuracy. Some key steps in this process include feature normalization, testing & optimization of models, determination of model accuracy, and the specification of a user strategy. After the model is defined, the data scientist needs to overfit the model, evaluate, and ultimately validate it in order to isolate features.
The determination of accuracy is done by testing the underlying strategy in practice; for example, given patients who are likely to appear for a given time-slot, do they actually show up as expected? How accurate is the time- slot scoring for patients who do appear? If overbooking is implemented, is it being applied correctly? These questions all need to be addressed in order to determine the accuracy of the underlying analytical model — this involves comparing real-world results with the relevant predictions. This level of additional analysis will enable a data analytics solution to further refine the model’s accuracy, if needed.
Use DSS, Save a Data Scientist
Of course if you are using an advanced software analytics solution, then many of the above steps would be automated. It would be able to clean datasets, isolate specific features, and automatically score the likelihood of patient no-shows.
If new features are added, then the models need to be re-trained. Additionally, data visualization needs to be done in order to determine if the features are relevant.
Like what you read? Keep going! Find out more in Dataiku’s ebook “Advanced Analytics for Efficient Healthcare. Data Driven Scheduling to Reduce No-Shows” .
In this ebook we highlight a specific issue — no-show appointments — and show how healthcare institutions can leverage predictive analytics to discover real-world solutions to a multi-billion dollar problem.
Please fill out the form below to receive the success story by email:
How can we come back to you ?