Joshua Cortez, a member of our Data Science Team, has put together a series of blogs on using survival analysis to predict customer churn. This is part one of the blog series.
Introduction
Customer churn is familiar to many companies offering subscription services. Simply put, customer churn is the event of a customer opting out of their subscription. They may do so for different reasons including: dissatisfaction with their plan because of consistently poor reception for a mobile phone network, the allure of better subscription packages/plans from competitors, or a variety of other reasons.
Source (1)
Businesses want to understand how and why their customers churn to improve their profits and deliver better services. In this blog post series, we’ll explore a branch of statistics called survival analysis to uncover insights that will be useful to understand and curb churn. We’ll use a churn dataset from a blog in the IBM Watson analytics community that describes a fictitious telco’s customers and how long they stayed before they churned.
Survival Analysis
Survival analysis has been traditionally used in medicine and in life sciences to analyse how long it takes before a person dies – hence the “survival” in survival analysis. The field however can be used to model other events that organisations care about, such as the failure of a machine, or customer churn. Okay cool. But what are the kinds of insights we can get from survival analysis?
We’ll talk about two main ideas in more detail in future blog posts: survival curves (in part II), and survival regression (in part III). We’ll discuss what they are, and what kinds of insights they bring to the table.
For today, an introduction to these concepts and an overview of our test dataset.
1. Survival Curves
Source (2)
An example survival curve – by charting the results we can visualise the changes over time and likelihood of churn (2).
What we can do with it:
i. Show how the likelihood of customer churn changes over time.
ii. Determine the optimal intervention point.
Questions it can answer:
i. How many years/months on average do our customers stay?
ii. How long do male customers stay compared to female customers?
iii. Is our understanding of our customer lifecycle accurate with reality?
Survival Regression allows us to apply a model to the survival analysis to
predict when an event is likely to occur.
What we can do with it:
i. Model the relationship between customer churn, time, and other customer characteristics.
Questions it can answer:
i. What’s the probability that this customer who is a female non-senior citizen with dependents will stay for 2 years?
ii. What are the significant factors that drive churn?
Examples of how survival analysis can be applied to other industries beyond telecommunications (2).
– Insurance – time to lapsing on policy
– Mortgages – time to mortgage redemption
– Mail Order Catalogue – time to next purchase
– Retail – time till food customer starts purchasing non-food
– Manufacturing – lifetime of a machine component
– Public Sector – time intervals to critical events
A worked example
Let’s get started by examining our sample churn dataset. Our dataset has 7043 customers and 20 variables. Most of the variables are categorical and can be used to describe attributes about a customer.
Categorical Variables:
– Gender, SeniorCitizen, Partner, Dependents, PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod, Churn
Numeric Variables:
– Tenure, MonthlyCharges, TotalCharges
Here’s a simple exploratory plot to get to know our data- a histogram of monthly charges. We can see how monthly charges are distributed across customers. A large proportion of customers are paying around $20 per month.
If you want to see more of the data, you can download the csv file from here.
In the next post we’re going to talk about survival curves and apply these to our dataset.
Sources:
(1) http://www.superoffice.com/blog/wp-content/uploads/sites/3/2015/05/reduce-customer-churn.png
(2) http://www.barryanalytics.com/Downloads/Presentations/Survival Analysis.pdf
3 Responses
Could we still use this model when we don’t know the exact time the customer churned? For example, they didn’t shop with us for a year. We don’t know for sure but we can safely assume they churned. The problem is this situation makes the ‘churn’ variable dependent on the ‘tenure’ variable. Does that make the model unusable for this situation?
Thanks!
Hi Jaryn,
When you don’t have a specific churn date you can turn the premise around slightly. Essentially you ask how likely is this person to purchase from us again in 1, 2, 3, 6, 9, 12,… months. This is sometimes referred to as Customer Buying Intention/Customer Purchase probability.
This isn’t churn per se, but if you have enough historical data and your model is accurate enough it can provide an approximation. Acting on it can come with a higher cost, you’re more likely to overspend on trying to keep customers that weren’t going to leave yet. It very much depends on what is the frequency with which customers are interacting with the organisation, making purchases.
This is also where I would push back and say, is a churn model really what you want in this situation? Churn models are popular in the analytics world because they work well in very specific situations. That is, regular customers with frequent purchases/use of a service, in a essential domain (one where when people leave they generally go to another service provider, e.g. telecos, banks, utilities providers). If your business doesn’t work like this other approaches would probably serve you better.
I want o know about this data set and Cox model