Extending the likelihood of a future count to the likelihood of a future total of randomly sized events.
So far we have estimated the likelihood of a future interval count of events within a range. In the first instance, it was assumed that the long-term frequency is known. At the second level, it was assumed that the long-term frequency is not known, but that there is a history that can be used for making estimates about the future.
To calculate the likelihood of a total within a range, you need at least history of events and a distribution for event sizes.
This series does not talk about the likelihood of a single event having a size within a target range. If you need to know that, you’re dealing with a non-repeatable event, probably an event with potentially extreme consequences on its own. If that’s where you are, you need a better understanding of the potential event size than you will find in this series. |
Conceive a distribution for event sizes
In this article, event size and consequence size are assumed to refer to the same thing. If they are not the same thing in your case—in many cases they won’t be the same thing—just be clear about whether you’re totalling an event size metric or a consequence size metric. Stick with one consistent concept of ‘size’ throughout your process. |
You also need to assume a distribution for those event sizes, such as the Normal, Gamma, Exponential, or LogNormal distribution, with specific parameters (mean and variability). That is, you need to assume a pattern in which event sizes are distributed. In the real world you won’t know that distribution, but you can make different assumptions about it, and get some comfort from those variations.
The good news is that you don’t really need to understand much about those scary distributions. Here’s a short cut.
- Event sizes can’t usually be negative.
- The distribution of event sizes also usually have a right skew—a concentration of sizes within a typical range, with a progressively smaller number of events at much larger sizes.
- You can usually also assume that events much smaller than the typical size range are rare or can be ignored.
Based on all of those pragmatic assumptions, you might reasonably assume that event sizes follow a LogNormal distribution. You don’t need to know what a ‘LogNormal’ is to use it. You don’t need to know how the parameters affect it. You only need to know where you want the event sizes for two percentile points, such as the median (50th percentile) and the 95th percentile. Just look at the graph to get an idea of what it’s doing.
LogNormal probability distribution for event sizes. The mean and standard deviation parameters for the distribution were chosen to ensure a 50th percentile (median) event size of 30 and a 95th percentile size of 150. This is the example case in Pareek (2012). The LogNormal parameters themselves are Mean=3.401 (natural logarithm of 30), Standard Deviation=0.978.
These assumptions are reasonable when you have nothing much more to go on. When there is serious data, and when there are serious consequences from an inaccurate model, a more critical approach should be taken to selecting the size distribution (LogNormal, Gamma, Exponential, Pareto, etc.) and to estimating its parameters. This warning came out of community consultation on LinkedIn. Some experts argue against the assumption of a LogNormal distribution because it can understate the likelihood of unusually high event sizes when applied to real data: …it is crucial not to use a probability distribution whose tail fades away to zero rapidly. Two notorious examples that have been used in the risk theory literature are Pareto and log-normal distributions… Gómez-Déniz and Calderín-Ojeda (2014):13 |
Find the size distribution parameters
The rest of this page is a summary of a method set out by Pareek (2012), with some Clear Lines upgrades.
Pareek (2012) was an inspiration for the Clear Lines to tackle this topic. Pareek’s method has some limitations. It assumes that the long-term frequency is known, and that the event size distribution can be known from two hypothetical data points. Even with its dubious assumptions, the method may be good enough for practical risk decisions, where best guesses are usually all we have to work with anyway. The Clear Lines ‘upgrades’ are to recognise the uncertainty of the long-term event frequency inferred from a limited history, and to set out the Monte Carlo method for the joint distribution of event counts and sizes. It appears to the Clear Lines that Pareek also made an unnatural assumption that all events within a time interval would have the same size (within the size distribution). The Clear Lines do not make that assumption. |
First, create a LogNormal distribution to represent event sizes. The Pareek method is to start by choosing a median (50th percentile) for event size, and a 90th percentile (or any similar pair of percentiles). You then find the LogNormal distribution parameters (mean and standard deviation) that generate a distribution that includes those two size values at the chosen percentile points. You need an iterative (looping) search algorithm to find the parameters that fit the target values best. It is easy with the Excel Add-in ‘Solver’, as shown in Pareek (2012), Figure 1.
The process can work with more than two combinations of percentile and event size, though I’m not sure whether that helps. Usually, there will be no single LogNormal distribution that exactly matches more than two given points. I see more sense in repeating the whole exercise with different values for the two percentile points. A similar method can be used with distributions other than the LogNormal. |
Download a working LogNormal Parameters Excel Workbook (14 KB)
A lookup table for your event size distribution
This table is not necessary in Excel, because Excel has a built-in inverse LogNormal function, LOGNORM.INV(). |
Create a table of potential event sizes in increasing order, starting from zero. For each event size calculate its cumulative probability, assuming the LogNormal distribution with the best parameters. In other words, for each potential event size, calculate the probability of a lower event size.
Include sizes in trials
Add a further stage to each trial, in which a size is selected for each of the single events predicted in the trial. There may be zero events in some trials. The event size selection is made by:
- generating a third random selector (0-1) for each event,
- using that random value to pick a point within the cumulative probability for event sizes, then
- finding the event size corresponding to that cumulative probability in the table of potential event sizes.
Calculate the total size of all events predicted within each trial.
|
Pareek (2012) seems to have simplified this calculation by assuming that all events in any time interval would be of the same size. See Pareek’s note at the bottom of Figure 1, and Point 2 above Figure 1. That simplification looks questionable to the Clear Lines. It is unnatural, and it may generate too much variation in the trial totals. Alain Vandecraen (LinkedIn) identified some further problems in the Pareek article. Despite that Clear Lines concern, Pareek may have been right to offer the method as fit for purpose. |
Find the proportion of trials that will push your buttons
Decide on the size range total event sizes in the future interval for which you want a likelihood. Count the proportion of trials with a total of event sizes in the range of interest.
Conclusions you may draw
Some Clear Lines results are shown and discussed in a drill-down page. The drill-down detail is hard work, so just read on if you’re in a hurry.
Using Monte Carlo trials for event counts and sizes combined, you can test intuitions such as:
- When the event counts are low, the variation in event sizes is important.
- When the event counts are high, the variations in event sizes tend to cancel out – but will not always do so.
- An increase to the 95th percentile event size that ‘feels’ unimportant can make a big difference to the likelihood of an unacceptable total.
Frequency histogram for a future period (one time unit) total event size based on a history of one event in five time units, median event size 30, 95th percentile event size 40. The frequency of a 0 total (no events) is omitted from the graph—it was 0.69 or 69%. This histogram will move around for different runs of 4000 trials. The bumps arise when the pattern of totals is dominated by the event count, with event size variation having less effect on the annual total.
Frequency histogram for a future period (one time unit) total event size based on a history of 20 event in 100 time units, median event size 3, 95th percentile event size 50. Under these conditions the graph is dominated by the variations in event size. The event count is relatively predictable and smooth. The histogram resembles the dominating LogNormal distribution for event sizes. The numbers on the horizontal axis indicate the top of the size range represented by the vertical bar. The frequency of a 0 total—no events—is omitted from the graph. It was 0.8 or 80% of trials.
The Clear Lines suggest using these Monte Carlo trials to find the likelihoods of future period totals representing particular degrees of deviation from planned success. Some deviations can be tolerated with a high likelihood, while the likelihood of more extreme deviations must be kept very low. If the trials show an unacceptable likelihood of a particular level of total of deviation, the risk is unacceptable, so a risk treatment must be chosen.
Pareek (2012) used the Monte Carlo trials method for calculating a 95% or 99% Value at Risk (VAR), as indicated in Figure 3 of that article. You can do that if you believe in VAR. I’m not sure that the Clear Lines believe in it. The Clear Lines Excel workbook includes both the 95% and 99% ‘VAR’ figures, among other percentile values for the total event size.
The assumptions behind your Monte Carlo trials are themselves uncertain. You can run the trials under different assumptions. You can look at the implications in each case. You can then make your real-world risk treatment decision based on the spread of implications over different plausible assumptions. The output illustrations show that slightly different assumptions can produce a big change in the likelihood of an extreme deviation from planned totals.
The likelihood of total event impact size with a range over a period may or may not reflect the same likelihood of a defined effect on enterprise objectives at the end of that period. That comes down to your conception of ‘impact’ and your conception of ‘enterprise objective’. At most one of those will be represented by your event size metric. Even if the Monte Carlo experiments are a real joy, there are many weak links in this chain of inference. They begin with the assumptions about the median and 95th percentile event sizes, the LogNormal distribution of those event sizes, and the validity of the event history for future projections. The doubtful assumptions continue with the effects of the total event size on enterprise objectives. The dramatic effect of moving the 95th percentile event size should be a warning. On the other hand, it is clearly better to work with dubious assumptions, knowing that they are dubious, than to ignore logic and numbers altogether. |
Map of the series
Likelihood of a future event… |
|||||||
Theory |
➜ | ➜ | ➜ |
…size total within a range This article |
|||
Excel |
About the Excel implementation |
Download the complete Clear Lines Excel Workbook (17 MB)
Main article on repeating risk events and likelihood
References
Pareek, Mukul (2012) Using Scenario Analysis for Managing Technology Risk. ISACA Journal, Volume 6 of 2012.
Gómez-Déniz E. and Calderín-Ojeda E. (2014) Unconditional distributions obtained from conditional specification models with applications in risk theory Scandinavian Actuarial Journal 2014:7. Found in full text free online.
Drill-down articles
Risk specialists | Version 1.0 Beta |
Previous article for Specialists
Previous article for Specialists
Main article on Repeatable risk events, frequency, and likelihood
Index to the series Repeatable events, frequency, and likelihood
Comments are moderated from a sea of spam, so may not be published immediately. Email contact may get a quicker response.