How to use survival analysis to predict when employees will leave

Want to be able to predict when an employee is going to leave? Survival analysis can help.

Share

We all understand the costs of employee turnover in our business. It’s expensive and therefore vitally important to understand the dynamics of how and when employees exit your business.

Just like death and taxes, in the world of HR, turnover is guaranteed. In fact, according to the Australian Bureau of Statistics, job mobility is at an all-time high with 1.3 million people changing jobs in the past year. Breaking this down by state, the Australian Capital Territory (ACT) had the highest rate of job mobility (12.8% in the year ending February 2022), with Western Australia (WA) at 11.4% and Northern Territory (NT) at 10.9%, following closely behind.

It sounds a little bleak, but it’s not. At some point, all employees will eventually exit the business, it may be within their probation period or it may be after a fruitful twenty years, or more.

Once we, as HR professionals come to terms with this fact, it follows that we would want to predict how long we can expect an employee to stay with our business once they’re hired. This has utility in many domains, from predicting the return on investment probability of a new hire to modelling your future workforce by considering future movements in and out of your business.

That’s where survival analysis comes in. And in this post, we’re going to be showing you how survival analysis can help answer questions such as: what proportion of our organisation will stay with the business past a certain time? Or, given they reach a certain tenure, at what rate would we expect them to leave the business?

What is survival analysis?

Survival analysis is a statistical method aimed at determining the expected duration of time until an event occurs. In this instance, the event is an employee exiting the business.

As the name might suggest, survival analysis was developed in biomedical sciences to analyse the proportion of patients surviving to particular times after the application of a treatment. Since then, it’s been applied to many situations where the event of interest is binary: that either it doesn’t happen or it does.

  • Engineering (Reliability Analysis): how long until a machine or part fails?

  • Sales (Churn Prediction): how long until a customer terminates their contract?

  • Human resources: how long until an employee terminates?

Technical definition: Survival analysis is a set of statistical approaches used to investigate the time until an event of interest.

In plain English: Expected time until an event happens.

READ NEXT: How to extract value from offboarding and exit interviews

Why survival analysis?

Turnover calculations are an important metric in most businesses. They are useful to provide a method to track the movement of employees out of your business and identifying potential risks. However, attrition rates calculated in isolation can sometimes be misleading. They are heavily impacted by reporting periods, i.e. a period of downturn or an acquisition, or may significantly skew the output for a particular period.

From a retention strategy perspective, an attrition rate ignores important patterns as it considers time as a chronology, rather than a variable. That is, it treats the turnover of a recent starter the same as a tenured veteran. This results in missing important patterns such as milestone-based turnover trends or staying power of certain employee groups, both of which help inform retention strategy.

FREE TOOL: Employee turnover calculator

Understanding turnover and tenure

Turnover and tenure have a complex relationship. A business with high turnover will logically have a lower average tenure, and at the same time, tenure is an important factor in the decision to exit. It follows that any analysis of attrition in isolation from tenure will smooth over important insight.

Considering the above diagram, which visualises the careers of individuals within an organisation (each line represents the start and end date of an employee), a traditional attrition calculation would simply count the number of end-points during a period and divide by the number of lines present at the beginning of the period.

Survival analytics (like intelliHR’s) considers all historical and current data points together, including those that remain employed, and groups them by key tenure groups, delivering useful, predictive insights about turnover probability.

READ NEXT:

Calculating the survival function

The challenge

The challenge with predicting employee turnover is that for anyone that is still employed at the time of observation, their future behaviour is uncertain. They might resign the next day, continue for another ten years, or anything in between. This uncertainty is called right-censoring.

Each line segment below represents the career of an employee from start to finish date. Survival analytics converts these career lengths to tenures to calculate the probability of reaching each milestone.

To convert time into tenure, which becomes the explanatory variable in the survival function, all start dates are normalised to a time zero (see below).

Survival analytics calculates the probability of a termination occurring, based on the number of employees terminated and the size of the sample still remaining at the time.

intelliHR uses the Kaplan-Meier method to estimate the survival function due to its ability to handle right-censored data. This is important for a HR tool, as right censored data is so prominent. This method incorporates information from all observations available by splitting tenure into logical milestones (i.e. six months) and considers the probability of reaching the next milestone (i.e. one year), assuming all previous milestones were successfully reached.

The Survival function is the probability that the career: of an employee will be greater than a particular time, t.

Reading a survival curve

The probabilities calculated above are plotted on the stepped survival curve (below). Although tenure is based on time and is, therefore, a continuous variable, the probabilities are calculated by grouping data into logical milestones of six months, giving it the stepped shape that you can see.

Because the probabilities are cumulative, meaning that the probability of reaching a given milestone relies on the fact that each previous milestone was achieved, the function is always decreasing. That is, the likelihood of reaching your third year will always be less than or equal to the probability of reaching your second year.

It’s expected that each survival curve will start at a probability of 1.0 at t=0. That is, there is a 100% likelihood that an employee will last until their first day (although, we know this isn’t always the case, if an employee fails to start). From here, the chart can be read by considering each vertical drop (step) as the change in cumulative probability as tenure advances. As can be seen in the above example:

  • Assuming an employee starts (100%) there is an 80% chance they will make it past six months.

  • The probability of “surviving” more than one and a half years is 7% less than reaching one year tenure.

A steep drop off in the curve suggests a greater risk of employees leaving the business at that particular length of tenure. The length of the horizontal line denotes the length of time (tenure) between the event of interest (employee terminating). A long horizontal line means that no employees were terminated across those values of tenure, therefore according to the data, the probability of surviving beyond that point does not decrease.

Key points:

  • Median: your median survival tenure can be read from the chart by drawing a horizontal line from the 0.5 point on the y-axis. The tenure at the point this horizontal intersects the curve is the duration you can expect 50% of your employees to ‘survive’ until.

  • Cumulative break-even: it is useful to consider the cumulative breakeven point for your employees. This can be done on an organisational level, or down to business units or cost centres. By calculating the time it takes for employees to start “paying back” the early investment of onboarding and training, you can calculate the probability of employees reaching this milestone through your survival curve – this can be seen in the graphic below.

95% confidence intervals:

The Kaplan Meier estimate is a statistic and therefore is subject to variance. The blue shaded areas on the graph represent the variation around the true value, known as 95% confidence intervals.

The smaller your sample or group is, the more variance there will be. As we expect that there will be fewer high-tenured employees, the certainty of retention probability reduces. This results in a larger blue-shaded area at higher tenures, denoting a larger range of variability around the true value, which should be considered when making any decisions based on a dataset.

Conclusion

Every organisation has a different survival curve, as well as each business unit within an organisation. Comparison of curves can help identify issues, patterns or trends that might require intervention.

Share