Know the likelihood of any number of risk events next year, when you have limited history of those events.
Finding the likelihood of a future event count within a range, from longterm event frequency was a lot of fun, obviously. But now things get serious.
It is important to distinguish between variability and uncertainty. Variability is the effect of chance and a function of the system. Additional data points don’t reduce the process variability. Uncertainty is a lack of knowledge about the parameters that characterize the physical system that is being modeled. This may be reducible with additional information. … An interval for a future payment (a prediction interval) must incorporate both the variability of the process and the uncertainty in the mean. A sum or even a linear combination of future payments will similarly incorporate process variability for each term, parameter uncertainty for each term, and parameter covariances between each pair of terms. Barnett Odell and Zehnwirth (2008), 1617. ‘Payments’ can be event counts, event sizes, or totals of event sizes. 
In realworld risk assessment, you never know the longterm event frequency with certainty or precision. You may have to rely on history within your enterprise, or on history from elsewhere in your industry. Either way, you will get only a general idea of the longterm frequency with which the event type occurs, even from a meticulously complete history.
There are three stages to estimating the likelihood of an event count within a given range over a future time interval, starting from a history of events.
The first stage is estimating the longterm frequency from the history. This first stage results in a continuous distribution of possible frequencies, with varying likelihoods.
From here likelihood is used in the statistical sense, distinct from probability. In statistics, hypothetical parameter values have likelihood, whereas random variable values have probability. In this case, the parameter is the unknown longterm frequency, and the random variable is the number of events in a time interval. The probabilities of all possible random variable values always add up to ‘certainty’ (1). The likelihoods of all possible parameter values do not. For inferences and predictions, the relative likelihood of different possible parameter values is more important than the absolute likelihood. 
This distribution of frequencies can generate a confidence interval for the ‘true’ frequency (if such a thing can be said to exist). We will use the continuous distribution itself, not the confidence interval.
The second stage is calculating a joint distribution for event counts, given the likelihood distribution of possible frequencies (from the first stage), and the probability distribution of event counts for each possible frequency. This stage can produce a prediction interval for the event count in any future time interval. We will use the joint distribution directly to assess the likelihood of a count within a specified range, in the third stage, bypassing any particular prediction interval.
In the third stage, you set a range of interest for the future time interval event count, then calculate a likelihood an event count within that range. This stage can be repeated for any number of ranges of interest. Your ‘range of interest’ is usually the range of event counts considered to have a particular consequence for enterprise objectives.
If all that sounds terribly complicated, it is. Especially if you want to understand and use the underlying formulae.
The better news is that basic Monte Carlo methods can do the hard work, and you can implement those basic Monte Carlo methods in a spreadsheet you can build yourself.
Monte Carlo methods use a large number of randomly varying trial scenarios to generate the relative frequency of particular outcomes. The relative frequency for an outcome in the trials replaces a calculated probability for that outcome. The random values within trials are drawn from defined distributions. In this case the defined source distributions are (a) the relative likelihood for each possible longterm event frequency and (b) the probability of a specific count of events, assuming a particular longterm frequency. 
Another piece of good news is that the length of the history doesn’t need to be in exact multiples of the future interval for which you are making predictions. You just need to know the number of events in the history, the duration of the history (in any time units), and the duration of the future prediction interval (in the same time units). Neither duration need be a whole number.
First stage: the likelihood distribution
The first stage creates a likelihood distribution for candidate values for the longterm frequency, the longterm frequency. Likelihood distributions are explained at length in Wikipedia.
The likelihood function is defined as a function of the longterm frequency, with the observed number of historical events regarded as given, which it is.
Likelihood(frequency) ≅ Probability(observed events  frequency).
The symbol ≅ means ‘is defined as’ and  means ‘given’, so the formula above is rendered in English as
The likelihood of a longterm frequency is defined as the probability of the actual observed count given that longterm frequency.
The probability is given by the Poisson distribution, as it was when we thought we knew the longterm frequency.
Likelihood distribution for candidate longterm frequencies over the range 0 to 20, where there is a history of exactly 4 events in 1 time unit. Candidate frequencies above 20 per time unit have nonzero likelihood, just as it is barely possible to have 20 or more events in a time unit if the longterm average is 4.
If your history of events is spread over multiple time units, you can define the likelihood of a candidate frequency per time unit as:
Likelihood(frequency) ≅ Probability(observed events  frequency * history duration).
Here * means multiplication, as it does in Excel.
Likelihood distributions for an unknown parameter resemble probability distributions but there are key differences, both metaphysical and numerical. The unknown parameter here is the ‘true’ longterm frequency of risk events. For a start, you must arbitrarily pick a series of candidate values for the longterm frequency at which to calculate likelihood, with a highest and lowest value. Another difference is that the total likelihood under the graph does not add up to 1, as it always does for a probability distribution. If the likelihood function for a candidate frequency calculates to .25 (or any other nonzero number), that does not imply a 25% chance that the true longterm frequency matches the candidate frequency exactly. There are infinitely many longterm frequencies with similar likelihood values, or even greater likelihood values. 
For the next stage, you will need a scaled version of the cumulative likelihood function, which approaches an upper limit of 1. To achieve that, ensure your candidate values for the frequency are in ascending order and evenly spaced. At each candidate value for the frequency, calculate the likelihood for that frequency, and add it to the cumulative likelihood for all lower candidate frequencies. When the cumulative likelihood has been calculated for all candidate values, scale all the cumulative likelihoods such that the highest cumulative likelihood equals 1.
The Clear Lines Excel method converts likelihoods into relative likelihoods (totalling 1), and the key formula allows for unevenly spaced candidate frequencies (in ascending order). It actually uses only evenly spaced candidate frequencies. 
You can plot your own likelihood functions to see what they look like, with candidate frequencies on the horizontal axis and likelihood on the vertical. The scale on the vertical axis is not important. The plot will peak at the average frequency over the known history, and there will be a tail out to the right. If the history contains an absolutely high number (>20) of events, regardless of duration, the peak will be narrow, and the visible tail will be short. If the history contains few events, regardless of duration, the peak will be wide and the tail long. Zero is a legitimate number of events in the history.
Likelihood distribution for candidate longterm frequencies over the range 0 to 200, where there is a history of exactly 50 events in 1 time unit. The plausible range for frequencies is relatively narrow and symmetrical, compared with the same graph for 4 events in the history.
A more sophisticated approach is possible if you have separate event counts for subsets of the historical period. The method above produces the same results if all of the history is added together as a single block of time. 
Second stage: trials
In the second stage, there are thousands of trials, each one representing a potential scenario for the future prediction interval.
Each trial randomly selects one candidate value for the longterm frequency, then randomly selects an event count for the future interval based on that chosen value for longterm frequency.
After those two selections, each trial produces an event count.
Both selections in each trial are random, but not from a uniform or undefined distribution. Both are taken according to a welldefined nonuniform distribution.
 The probability of any given candidate value for the frequency being chosen for the trial is proportional to its likelihood, as calculated within the likelihood distribution from the first stage.
 The event count chosen for the future interval has a Poisson distribution, driven by the candidate frequency chosen for the trial. Each possible event count is selected with the probability given by that Poisson distribution.
As at 11 December 2018, this method has been questioned in community consultation on LinkedIn. There will be updates here when the questions are resolved. 
With some help from Google, I found a simple formula that combines the first and second stages and apparently avoids all of this Monte Carlo mess. I also found that the formula can produce nonsensical results, so while I respect the effort, I am not recommending it as a simple alternative. I expose details in a footnote. 
So here are the charts.
Distribution of candidate longterm frequencies used in trials, for a future period of one time unit, given a historical period of one time unit containing 4 events. Each dot represents the proportion of all trials (vertical axis) with a candidate frequency matching the value on the horizontal axis. As exact matches of continuous variables are impossible, candidate frequencies are grouped into ranges to create this chart. For this chart, 20 equally sized ranges were defined. This distribution is intended to match the relative likelihood for candidate frequencies. In practice, there will be random noise, appearing as a lumpy graph.
Cumulative distribution of candidate frequencies used in trials, for a future period of one time unit, given a historical period of one time unit containing 4 events. Each dot represents the proportion of all trials (vertical axis) with a candidate frequency equal or lower than the candidate frequency on the horizontal axis. The same 20 ranges were used as for the previous chart.
A typical distribution of event counts generated in Monte Carlo trials, for a future period of one time unit, given a historical period of one time unit containing 4 events. The distribution should follow closely the distribution of candidate longterm frequencies chosen in trials, with some random noise. There were a few trials with event counts higher than 20, not shown in the chart.
A typical cumulative distribution of event counts generated in Monte Carlo trials, for a future period of one time unit, given a historical period of one time unit containing 4 events. Each dot represents the proportion of all trials (vertical axis) with a trial event count equal or lower than the candidate frequency on the horizontal axis. The distribution should follow closely the cumulative distribution of candidate longterm frequencies chosen in trials, with some random noise.
Comparing the trial distributions with the simple Poisson distribution
When the frequency is inferred from a finite history, trial distributions are always wider than the Poisson distribution for a fixed longterm frequency. The widening is created by the uncertainty around the true longterm frequency. The central peak is correspondingly lower.
The solid blue bars show the distribution of event counts in trials, with frequencies inferred from a finite history of 4 events in exactly one time unit. The striped green bars show the Poisson probability distribution for a fixed longterm frequency of the same 4 events per time unit. The distribution of trial event counts is wider and has a lower peak, representing the extra uncertainty around the ‘true’ longterm event frequency.
Third and final stage: likelihoods for ranges
In the third stage you specify the event count range of interest, and get the likelihood (relative frequency) of future interval event count within that range.
The likelihood of an event count in the range of interest is indicated by the proportion of trial event counts that lie within that range. Charts can fairly label the proportion of trials with a count in a range as the ‘likelihood’ of a count in that range.
You can also use the method to find the count range within which a given proportion of possible futures will land. For example, you can find the narrowest count range starting at zero within which 95% of all trial counts land. That count range represents something like a 95% Value at Risk [VAR], if you were to take the ‘count’ as a substitute for ‘value’.
Conclusions you may draw
When you try some different scenarios, you will see what you guessed all along. Whatever the longterm frequency of events, your predictions will be more concentrated in a likely range when you know about a lot of those events, over any historical period. If you have only a small number of events to go on (or even none), the trial results will spread out over a wide range of possible future event counts. You can try this by comparing the distribution of future interval counts for the cases in this table.
Cases  
A  B  C  D  
Case inputs  
Fixed longterm event frequency  4  
Events in history  40  4  1  
Duration of history  10  1  .25  
Duration of future interval  1  1  1  1 
Case outputs  
Most likely number of future events  4  4  4  4 
5th percentile of trial event counts (lower end of 90% prediction interval)  1  1  1  1 
95th percentile of trial event counts (upper end of 90% prediction interval)  8  8  11  20 
Proportion of trials with 0 events (likelihood of 0 events)  1.83%  ~2%  ~3%  ~4% 
Proportion of trials with event count 6 or more (likelihood of 6 or more events)  21.49%  ~23%  ~38%  ~58% 
The rate of events within the history is the same (4 per time unit), but you have much less confidence and precision in your estimate for the future when there is just one event in the history (Case D) than when there are 40 (Case B), or when you just knew the longterm frequency to be 4 per time unit (Case A).
To look at it like a risk analyst, a high event count with severe consequences becomes more likely if you don’t have a long experience of those events. A good history tells you when those events didn’t occur, just as much as it tells you when they did. A lucky year with no events also looks more likely when you have had little experience to make you sad or wise.
You may also want to explore the pattern of future event counts when you have 0 events in your history, but varying amounts of that uneventful history.
If you like this game and want to see how it works visually, create some frequency distribution histograms for your trial counts. You can choose to show the proportion, cumulative proportion, or both on the one chart. The matching chart within the Clear Lines Excel workbook is on the Master Charts sheet, Distribution of counts in trials (XY dots). It’s easier to show cumulative proportions with XY dots than with histogram columns. You can watch the graph narrow and widen with differing amounts of history containing the same average event frequency. 
Count ranges and risk decisions
A previous post discussed the likelihood of count ranges from minor injuries caused by workers falling down stairs. In the example, the enterprise was expecting 1019 minor injuries from staircase falls in the next year, assessing an 81% likelihood. An 81% likelihood of 1019 injuries was considered acceptable. The corresponding likelihood of a ‘disappointing’ or ‘bad’ (25 or more injuries) year was assessed at 1%, which was also deemed acceptable.
Those likelihood were calculated from an assumed longterm event frequency of 15 per year.
But then suppose that the frequency is not known, but is inferred from a short history, containing 5 such injuries in 4 months. The indicated frequency is still 15 per year, but that frequency is surrounded by uncertainty. The method in this article can be used to take the uncertainty into account.
After taking the uncertainty of the longterm frequency into account, the likelihood of a disappointing or bad outcome is over 18%, not below 1%, as it was when an injury frequency was assumed.
In many organisations a 1% likelihood of a disappointing outcome would be acceptable, but an 18% likelihood of such an outcome would not be acceptable. The enterprise would then make a different decision about accepting or treating the risk. The different decision would be based on having more or less information. The risky stairs themselves do not change.
The striped green bars show the Poisson distribution of next year’s minor injury counts based on an assumption of 15 per year. The solid blue bars show the distribution of counts based on trials inferring a longterm frequency from a history of 5 injuries in 4 months. The areas where solid blue stands taller than striped green represent the increase in risk resulting from uncertainty. The vertical lines correspond to boundaries between consequence levels for the enterprise.
Map of the series
Likelihood of a future event… 

Theory 
➜  ➜ 
…count within a range, from a history of events 
➜  
Excel 
About the Excel implementation 
Download the complete Clear Lines Excel Workbook (17 MB)
Main article on repeating risk events and likelihood
Footnotes
The Wikipedia page for Poisson distribution, under the heading Confidence interval, shows a credible formula for the confidence interval for a longterm frequency, based on a known count in a known historical period. My Monte Carlo trials aligned roughly with the predictions of that formula. The difference was that my Monte Carlo spreadsheet trials consistently produced fewer candidate frequencies below the lower end of the confidence interval than would be predicted by the formula. There was no such discrepancy at the high end of the interval. Any thoughts on this discrepancy would be of great interest to the Clear Lines.
With some Googling I found a concise and apparently understandable formula for a prediction interval in Luko and Neubauer (2014), properly called the ‘normal approximation’. The formula is labelled (2) on page 14. The original source may have been Hahn and Meeker (1991). However, the formula is an approximation that can produce crazy results outside of its comfort zone. In a later edition, Hahn Meeker & Escobar (2017) advised that
The normalapproximate method [repeated by Luko and Neubauer (2014)] requires that [the number of historical events and the forecast count for the future period] both be 30 or larger for the coverage probabilities to be reliably close to the…confidence level. (7.6.5).
Hahn Meeker & Escobar (2017) gives the original source for the normal approximation formula as Nelson (1970).
Regardless of the limits of the approximation, when I applied the formula to the example in the Luko and Neubauer article, my answer did not match theirs. To match theirs (exactly) I omitted the Z factor (1.96 for α=0.05) entirely, and thereby assumed it was unity (corresponding to an absurdly high α). I did not see a correction in the next issue of ASTM Standardization News.
My Monte Carlo spreadsheet trials match the normal approximation—approximately, as expected. The proportion of trial counts inside the formula confidence interval was typically just below the expected 1α. The exceptions were uneven, with ‘too few’ trials below the low end, and ‘too many’ above the high end. The problem is not the trials, it’s the approximate formula for the interval.
Partially valid approximations pop up regularly in practical statistics, often without an acknowledgement of limitations. Warning signs are reliance on the normal (Z) or chisquared (χ^{2}) distributions, or the appearance of a ± symbol. I have noted that the ‘credible’ Wikipedia formula for the confidence interval has one of the warning signs on it, reliance on χ^{2}, but I’m not jumping to conclusions. 
References
Barnett G., Odell D., and Zehnwirth B. (2008) Meaningful Intervals Casualty Actuarial Society Forum, Fall 2008 (full text free online, sometimes).
Hahn, G. J., and Meeker, W. Q. (1991) Statistical Intervals: A Guide for Practitioners, WileyInterscience, John Wiley and Sons Inc., New York, N.Y.
Hahn, G. J., Meeker, W. Q. and Escobar, L. A. (2017) Statistical intervals : a guide for practitioners and researchers Second Edition Hoboken, New Jersey: John Wiley & Sons, Inc.
Luko, Stephen N. and Neubauer, Dean V (2014) Statistical Prediction Intervals: Applying the intervals to attribute data ASTM Standardization News, MarchApril 2014. This article references previous articles in the same journal in 2011, and the 2011 articles refer to authoritative sources, such as Hahn and Meeker (1991).
Nelson, W. (1970) Confidence intervals for the ratio of two Poisson means and Poisson predictor intervals, IEEE Transactions on Reliability R19 4249 (not sighted).
Pareek, Mukul (2012) Using Scenario Analysis for Managing Technology Risk. ISACA Journal, Volume 6 of 2012.
➜
Next article for risk specialists
Drilldown articles
Previous article for risk specialists
Likelihood of a future event count within a range, from longterm event frequency


Risk specialists  Version 1.0 Beta 
Parent articles