Risk events that can repeat don’t have a likelihood. They will happen, the only question is when. Events that can repeat have a frequency, not a likelihood. If you confuse frequency with likelihood, you’re confusing risk with cost.
A typical repeatable risk event
You identify a risk. Let’s say it’s something really simple, like this:
Someone falls on the stairs and suffers a minor injury, losing less than half a day in work time.
You then want to assess the likelihood and consequence for your risk. When it comes to the likelihood, you realise that such an incident is almost certain to happen at some point. It may happen many times during the coming year. So ‘almost certain’, then. But are near-certainties even part of risk management? If something is nearly certain, it isn’t a risk, is it? It’s a plan. Or if you can’t control it, a forecast.
On the consequence side, you might think the injury is something that the worker affected would take seriously, and so the enterprise should take it seriously, too.
On the other hand, you might also wonder about the overall ‘consequence’ of a single minor injury, when there are dozens of such injuries in any year. You’re planning for them. Events within an expected level, even injuries, can have no effect on realistic enterprise objectives. There is no deviation from expected achievement.
Even if the enterprise adopted an unrealistic objective of zero work place injuries, hardly anyone would think that a single minor injury represents failure.
Other repeatable risk events
You will identify many risks as repeatable events, particularly when you first think about what can happen. Repeatable risk events include airliner crashes, earthquakes, collisions in database updates, data centre power failures and countless others. Repeatability is not about scale.
Some risks are not repeatable. That’s not because they are special. Risks that come from an uncertainty about the present, and not from a potential event in the future, do not repeat. Those risks don’t even ‘happen’ at any particular time. An example of a pure uncertainty arises when you are playing poker. Before the showdown, you are uncertain about whether you have the best cards. That uncertainty matters. While you’re placing bets, or folding, your cards aren’t changing, and neither are your opponent’s cards. You just don’t know if your opponent’s cards are better or worse than yours.
Whether a ‘risk’ is or is not a repeatable event often comes down to how you look at it. For example, ‘losing a poker hand’ is a repeatable event, and what you care about is the total losses over the session.
The problem with repeatable risk events
Your minor staircase injury ‘risk’ was Someone falls on the stairs and suffers a minor injury, losing less than half a day in work time.
That can definitely happen, and it matters. Yet that risk presents a near-certainty that enterprise objectives will be achieved as planned. It is clearly ‘a risk’ worth considering, and yet it represents zero risk.
That isn’t useful, is it? It does not advance ‘risk management’. It does not help your enterprise. It does not help the worker on the stairs. It does not help you.
This post solves the problem, and it will help you.
The first part of the solution is that repeatable risk events don’t even have a likelihood. Your minor injury risk scenario describes a risk event that is sure to occur, and to occur repeatedly, sooner or later. It doesn’t have a meaningful likelihood on its own.
The second part of the solution is that repeatable risk events can always be re-described as instance counts or totals over a time interval. Instance counts and totals over an interval do have a likelihood, other than near-certainty. They can also have a meaningful effect on enterprise objectives—that is, a consequence.
Why repeatable risk events don’t have ‘likelihood’
Any risk event that can occur multiple times is certain to occur sooner or later. It’s not an ‘if’, it’s a ‘when’.
The repeatable event has a long-term average frequency, rather than a likelihood.
To nail down a likelihood, you first need to set a time interval. That interval can be short (the millisecond processing of an online transaction) or long (the lifetime of a city in an earthquake zone). It can be defined in seconds, weeks, or years, or by the passing of future milestones. But you need to understand that time interval. That time interval is the life-span of the objectives affected by uncertainty.
For practical risk assessment, you might set a very literal time interval, like the coming financial year.
The risk event itself has a frequency, not a likelihood. The occurrence of any given number of instances of the risk event within your chosen interval does have a likelihood.
That likelihood follows from the long-term frequency of the risk event, the length of the interval, and from any tendency for occurrences of the event to follow a pattern.
Patterns might be a tendency for events to occur in clusters, or perhaps a tendency to occur cyclically, with a predictable amount of quiet in between. Usually, you keep things simple by assuming that the repeating events occur at random with no pattern at all. That would be a reasonable assumption for staircase injuries, other than injuries during emergency evacuations.
Describing repeatable events for risk likelihood assessment
Most often, you will care about instance counts inside or outside a range, rather than exact count values. For minor staircase injuries over a year, you might think of 0-9 injuries as fortunate, 10-19 as normal, 20-24 as qualified success, 25-49 as disappointing, and more than 50 as ‘bad’, representing a clear failure of workplace safety.
So you set aside your original risk description, which read Someone falls on the stairs and suffers a minor injury, losing less than half a day of work time.
In its place you describe six separate risks in your risk register, each with a different number range for injury instances.
|Risk description||Assessed Likelihood**||Consequence level (for workplace injuries)|
|During the year there are 50 or more cases of staircase falls resulting in minor injury*.||0%||Bad|
|During the year there are 25-49 cases of staircase falls resulting in minor injury.||1%||Disappointing|
|During the year there are 20-24 cases of staircase falls resulting in minor injury.||11%||Qualified success|
|During the year there are 10-19 cases of staircase falls resulting in minor injury.||81%||Planned (success) not a risk|
|During the year there are 5-9 cases of staircase falls resulting in minor injury.||7%||Good|
|During the year there are fewer than 5 cases of staircase falls resulting in minor injury.||0.1%||Excellent|
*A ‘minor’ injury is one that loses less than half a day of work time.
**The assessed likelihood is a subjective number, picked from a credible range for practical risk assessment purposes. Likelihoods are not facts. You can’t make them into facts by blurring them into ranges or into soft words like ‘possible’. In this example, the likelihoods were calculated from on an assumed long-term frequency for minor staircase injuries. That long-term frequency would never actually be known, so despite the calculation stage, those assessed likelihoods are not facts.
Each of these ‘risks’ has a different likelihood and a different consequence. Each risk has a likelihood and a consequence within a range that helps with making decisions to accept or change the risk exposure. You did not have such a helpful likelihood and consequence when the risk was described only as the repeatable event, the minor staircase injury. With that risk description, you only had a near-certainty that outcomes would match objectives. That near-certainty is shown to be misleading by the likelihoods for separate instance count ranges. (Those figures are themselves disputable.)
If a single instance is enough to affect your year, your number ranges for separate risk descriptions might be more like ‘none’, ’one’ and ‘two or more’. If a single instance of your repeatable risk event is enough to change the direction of your enterprise, then you don’t need to consider number ranges other than ‘zero’ instances and ‘one’ instance. Many risks with non-trivial consequences fit this pattern. For those risks, you won’t need to change the risk descriptions to separate instance counts or total ranges.
Sometimes you might care about the total effect of all the instances that occur, rather than a simple count. Your repeatable event might come in different sizes per instance, or with consequences of different sizes. You will then be most concerned with the total consequences from whatever number of instances occur during the time interval. You would re-define separate risks based on ranges for the total consequences.
For a simple example, in the case of injuries from staircase falls, you might choose to understand the total consequence of minor injuries in terms of the total work time lost over the year. You could re-define staircase injury risks like this.
|Risk description||Assessed Likelihood**||Consequence level (illustration only)|
|During the year, more than 25 days of work time are lost due to staircase falls.||1%||Disappointing|
|During the year 20-24 days of work time are lost due to staircase falls.||20%||Qualified success|
|During the year, 9-20 days of work time are lost due to staircase falls.||75%||Planned (success) not a risk|
|During the year, 4-8 days of work time are lost due to staircase falls.||4%||Good|
|During the year, there less than 3 days of work time are lost due to staircase falls.||0%||Excellent|
**The assessed likelihood is a subjective number, picked from a credible range for practical risk assessment purposes. Likelihoods are not facts. You can’t make them into facts by blurring them into ranges or into soft words like ‘possible’. In this example, the likelihoods were pulled from the air.
Why counts and totals are a good idea
Defining risks in terms of counts or totals means that every risk description is something that will happen (once), or won’t happen, over the time covered by the risk assessment.
You can then assess a meaningful likelihood for each registered risk, even if that meaningful likelihood is as subjective and uncertain as any other likelihood used in risk assessment. You can also assess the consequences of each risk in terms of effects on objectives over the time.
If you don’t do that, you’re confusing likelihood with frequency. To treat likelihood and frequency as equivalent is to regard high-frequency low-impact events as equivalent to low-frequency high-impact events. The long-term total of impact may be the same, but the difference between them is the uncertainty of the final impact over any given block of time. If you disregard that uncertainty, you’re disregarding risk. ‘Risk’ is the uncertainty itself. Risk is not the expected cost of doing business.
Some types of risk are necessarily described as something that either will or won’t affect objectives, with no reference to counting or totalling instances. For example, a planning assumption is either true or not true. You won’t need anything from this post to deal with those.
How to find the likelihoods of counts and totals
Given a long-term frequency of minor staircase injuries, it is possible to calculate the likelihood of each instance count range for the next year. For example, the likelihood of a disappointing 25-49 injuries might be 1%. That would follow mathematically if the average long-term frequency is known to be exactly 15 injuries per year, with no pattern of instances. The other likelihoods in the minor injuries example were derived in the same way from the same assumptions. The key concept here is the Poisson distribution.
Unfortunately, you are not usually given the long-term frequency. No-one will actually know it. The frequency is itself usually a subjective estimate, with its own range of uncertainty. You can save some time by simply estimating directly the likelihood of an annual minor injury count in each number range. Without any history, those direct likelihood estimates will be wildly unreliable. The resulting risk management decisions should err on the side of caution.
If there is a history of event instances, you can add some mathematical objectivity to your frequency or likelihood estimates. The key concepts are Poisson parameter estimation, Poisson parameter likelihood function, Poisson parameter confidence interval and Poisson prediction interval. It’s the prediction interval that turns a known history into a future prediction.
This pathway to ‘likelihood’ is more attractive if your history:
- includes a high total of event instances
- is known to be complete over the historical period, and
- represents a reasonable basis for predicting the future.
It will be a better basis for prediction if there are no important changes between the recorded past and the expected future.
These mathematical methods can be extended to the total effect from multiple events of varying sizes, such as injuries with varying amounts of lost work time. They can be applied to many types of risk, including information security risk.
The techniques for totals are more complicated than the techniques for counts. They have had a lot of attention within the insurance industry. You might need Monte Carlo methods, but you can do the basic Monte Carlo work with a desktop spreadsheet.
In depth: How to turn an event frequency into likelihood, using Monte Carlo trials in Excel.
Whenever calculations are involved, it is easy to hide wild guessing behind them. The illusion of precision is a potential problem throughout ‘risk management’. That isn’t always a reason to avoid numerical calculation, but it is a reminder to make clear to your decision-makers the subjective and uncertain origin of any numbers, especially numbers coming out of clever calculations.
Question for experts
Can a repeatable risk event have a likelihood, or have I nailed it?
This post was inspired by Pareek (2012), and by my own prolonged confusion when writing the last post on the ‘reasonable worst case’. The ‘reasonable worst case’ concept is not helpful for risks where the count or total is what matters, mainly because repeatable events don’t have a likelihood, nor a definite consequence. Conversely, this post on repeating events will work only when the events are all of the same kind, and do not have ‘case’ differences other than count or size.
I guess most of us have learned the risk re-description ‘trick’ one way or another. It’s only now that I’ve tried to explain it fully.