# Likelihood of a future event count within a range, from a history of events: in Excel

 What to read first: Likelihood of a future event count within a range, from a history of events Risk specialists Version 1.0 Beta

How you can run Excel Monte Carlo trials to find the likelihood of a future event count within a range, from a history of events.

First stage: the likelihood distribution

To generate the likelihood distribution, create a list of candidate values for the long-term frequency, expressed as the average event count per time unit. A reasonable initial range for your candidate values is from zero to ten times the average frequency within the history. The spacing between adjacent candidate frequency values should be tight, perhaps the range divided by 10000, so you get 10001 candidate values starting with 0. (Zero is not a real candidate, even if your history contains zero event occurrences.) Name your list column CandidateFrequencies_CandidateFrequency (this is a Named range, created with the Name Manager).

For each candidate value, calculate the probability of the exact number of actual historical events, assuming that the real long-term frequency matches the candidate value. In Excel, that’s

=POISSON.DIST(historicaleventcount,candidatefrequency*history duration,FALSE)

This probability is called the likelihood function of the long-term frequency, and you have generated its likelihood distribution. It includes an adjustment for the duration of the history, which may not be one time unit. Call this column CandidateFrequencies_Likelihood.

You now have a list of 10001 points in the likelihood distribution, each with a separate likelihood.

Later stages use a cumulative version of this function. Add a column representing the cumulative likelihood (starting at 0 for candidate frequency 0), and a further column representing relative cumulative likelihood, with an accumulation of exactly 1 at the end. Name these columns CandidateFrequencies_CumulativeLikelihood and CandidateFrequencies_RelativeCumulativeLikelihood respectively.

In the Cumulative Likelihood column, use:

=<cell above>+(CandidateFrequencies_Likelihood*(candidate frequency-previous candidate frequency))

In the top cell, just put 0, not a formula.

Multiplying by the difference between candidate frequencies allows for uneven spacing of the candidate frequencies.

In the relative cumulative likelihood column, use:

=CandidateFrequencies_CumulativeLikelihood/MAX(CandidateFrequencies_CumulativeLikelihood)

That creates your cumulative relative likelihood that starts and runs up to 1, assuming that your range of candidate frequencies covers virtually all plausible values.

Second stage: trials

 As at 11 December 2018, this method has been questioned in community consultation on LinkedIn. There will be updates here when the questions are resolved.

The spreadsheet implementation of this stage is a bit messy. You may have your own way to do it. I’ll give the key points in my method, in case you haven’t thought of those.

Build on the workbook created for the first stage. In that workbook, name your first sheet ‘Candidate Frequencies’.

Add a further sheet called ‘Control Panel’. Within the Control Panel, name three cells ControlPanel_HistoryCount, ControlPanel_HistoryDuration and ControlPanel_FutureDuration respectively. Put in some values representing the number of events in the history, the length of the history, and the length of the future prediction interval (in the same time units).

Create a second workbook sheet for trials, with one trial per row. Call it ‘Trials’. You can put trial reference numbers in the first column. Eventually you will want thousands of trial rows, but you can start small and grow later.

In the new Trials sheet, add a column for generating for each trial a simple random number from between 0 and 1, with uniform probability distribution: =RAND(). This random value is used to select the candidate frequency for the trial. Name this column Trials_FrequencySelector.

Add a further column that finds the row in the candidate frequency sheet containing the lowest relative cumulative likelihood higher than the random selector value from the column to the left:

=MATCH(Trials_FrequencySelector,CandidateFrequencies_RelativeCumulativeLikelihood,1)

The third parameter of MATCH(), the ‘1’, just tells Excel to look for the first value higher than the random selector value. Name this column Trials_RowInCandidateFrequencies.

Add a column to look up the candidate frequency in that matched row:

=INDEX(CandidateFrequencies_CandidateFrequency,Trials_RowInCandidateFrequencies+1)

The +1 is necessary because The Excel MATCH() function counts the first cell as 0, whereas INDEX() counts the first cell as 1.

Call this column Trials_TrialCandidateFrequency. This value is the candidate frequency for the trial row.

Add a column containing a further random generator =RAND(). This random value is used to select the trial event count from within the Poisson distribution for the trial’s candidate frequency. Name this column Trials_EventCountSelector.

Leave two columns spare for filling later. You next create 101 further columns representing all potential event counts for the trial, starting at zero. So put the potential count (0..100) at the top of each column. Name that row Trials_PotentialCounts.

Also name the 101-column range Trials_PotentialEventCountColumns. While you’re there, also name the left-most column of that area Trials_FirstPotentialCountColumn (that’s the column representing 0 events).

In each cell within the added grid, calculate the cumulative probability of the event count for the column, given the long-term frequency for the trial:

=POISSON.DIST(Trials_PotentialCounts, Trials_TrialCandidateFrequency*ControlPanel_FutureDuration,TRUE)

You might pick a range of potential event counts other than 0-100. Whatever the range of potential event counts, it must run much higher than a simple extrapolation of the history.

You find the event count for the trial by matching the random value in Trials_EventCountSelector with one of the cumulative probability cells in the row. The event count for the trial is shown at the top of the column containing that cell.

To do this in Excel, go back to the two columns left spare. In the left one, identify the cell within the current row containing the lowest cumulative probability that is higher than the random number selector for the count. You can do this with the MATCH() function.

=IF(Trials_EventCountSelector<Trials_FirstPotentialCountColumn,0,MATCH(Trials_EventCountSelector,3:3 Trials_PotentialEventCountColumns,1))

The 3:3 Trials_PotentialEventCountColumns refers to the intersection of row 3 (the current row in my example) with the named area Trials_PotentialEventCountColumns. The 3:3 will remain the current row when you fill down further rows. Call this column Trials_ColumnForEventCount. The enclosing IF() is needed because of the way the Excel MATCH() function works. The value displayed is the number of the column containing the matched probability, starting at 0 for the first column of cumulative probabilities.

In the remaining spare column, look up the potential event count for the matched probability cell, at the top of the matched column:

=INDEX(Trials_PotentialCounts PotentialCountColumns, Trials_ColumnForEventCount+1)

This cell represents the trial outcome, the event count. Name this column Trials_TrialEventCount.

Third and final stage: likelihoods for ranges

In your Control Panel sheet, add two cells to hold the lower and upper end of the range of interest for the future period event count (the ‘target range’). Call these cells ControlPanel_CountTargetRangeLowEnd and ControlPanel_CountTargetRangeHighEnd.

Also in the Control Panel sheet, add a cell containing the number of trial rows. Name this cell ControlPanel_NumberOfTrials.

In the Trials sheet, add three further columns containing TRUE/FALSE values. The first column is TRUE if the trial row’s event count is below the low end target range. The second column is TRUE if the trial row’s event count is above the high end of the target range. The third column is TRUE if neither of the previous conditions apply. Name these columns Trials_CountBelowTargetRange, Trials_CountAboveTargetRange, and Trials_CountInTargetRange. The formulae can be:

=Trials_TrialEventCount<ControlPanel_CountTargetRangeLowEnd

=Trials_TrialEventCount>ControlPanel_CountTargetRangeHighEnd

=AND(NOT(Trials_CountBelowTargetRange),NOT(Trials_CountAboveTargetRange))

Back in the Control Panel sheet (or wherever you prefer), calculate the proportion of cells in the Trials_CountInTargetRange that have the value TRUE.

A simple formula to do that is

=COUNTIF(Trials_CountInTargetRange,TRUE)/ControlPanel_NumberOfTrials

To allow for an open-ended target range, the Clear Lines used this long but straightforward variation:

=IF(ISBLANK(ControlPanel_CountTargetRangeHighEnd),
1-COUNTIF(Trials_CountBelowTargetRange,TRUE)/ControlPanel_NumberOfTrials,
COUNTIF(Trials_CountInTargetRange,TRUE)/ControlPanel_NumberOfTrials
)

Map of the series

 Likelihood of a future event… Theory How to turn an event frequency into likelihood ➜ …count within a range, from a long-term event frequency ➜ …count within a range, from a history of events ➜ …size total within a range Excel About the Excel implementation …in Excel. …in Excel. This article …in Excel.