Skip to content

Simplifying the statistics behind the Elusion Test

| Written by Altlaw

The Elusion Test is a validatory test run at the stabilisation point of a Technology Assisted Review (or TAR), that estimates the accuracy of your active learning algorithm in identifying relevant documents. There are two types of Elusion Tests, allowing you to run tests specific to your data and desired outcomes, these tests are:

• Fixed Testing – A test sample of a specified size is created.

• Statistical Testing – A random test sample is created where the size of the sample is dependent on a given Confidence and Margin of Error.

Regardless of which test you choose, the confidence level and margin of error are crucial statistics to understand in order to properly interpret the results of your Elusion Test, yet they are not entirely clear in their meaning. To understand what confidence level and margin of error refer to in the case of an Elusion Test, we must first understand that the Elusion Test is run on a sample of documents taken from the ’discard pile’ created by the active learning algorithm. The discard pile is made up of all the documents the algorithm has deemed to have a relevancy score below your desired cutoff. None of these documents has been reviewed, therefore we as reviewers have no idea what the document landscape of this pile may look like. This is where confidence level and margin of error come into play.

Screen Shot

Figure 1: A random sample of documents are taken from the discard pile to create the sample set upon which the Elusion Test is run.

Confidence Level is the percentage probability of the sample upon which the Elusion Test was run, being an accurate reflection of the entire discard pile. Simply put, if we had a discard pile of 10,000 documents and only test reviewed 100, we have only tested 1% of the document pool, therefore we are unlikely to have an accurate representation of the pool as a whole. On the other hand, if we test reviewed 5,000 documents, we have now tested 50% of the documents and are much more likely to understand the document landscape.

Note: these percentages are NOT the confidence level itself!

As an example, a 95% confidence level speaks to the 95% likelihood of the actual number of eluded documents being within the predicted range given by the Elusion Test. The higher the confidence level you need, the more documents you will have to review and the more time the Elusion Test will take.

Screen Shot

Figure 2: Here we have a graphical representation of the Confidence Level and Confidence Interval. The dark blue region represents the 95% confidence level that the true number of eluded documents is within the range of the dark-to-light boundaries. These boundaries are otherwise known as the confidence interval.

Margin of Error refers to the conservative range provided by the Elusion Test, within which the actual results for the whole discard pile are likely to lie. These are often written as a Number ± Margin of Error and represent the maximum difference from the value calculated e.g. Elusion Rate = 4% ± 5% means that the actual Elusion rate for your discard pile is in fact somewhere between 0% and 9% of your documents. Confidence Level and Margin of Error work together to give you the most likely estimate for the number of eluded documents within your discard pile, however, the value of the margin of error is seldom equally spaced about the presented value, and in the case of RelativityOne, is also often an overestimate (overly cautious). In order to calculate a more realistic range of likely eluded documents, you want to calculate the Confidence Interval.

Calculating Confidence Interval:

Screen Shot

Now, I think we can all agree that this is quite a daunting equation, but not to worry! All this is calculated within RelativityOne, so all you have to worry about is what to do with the numbers it provides for you. This is an incredibly useful calculation to perform as it gives a much more realistic image of your discard pile and potentially eluded documents. The outputs of this equation give you the highest percentage and the lowest percentage of documents in your discard pile that are likely to be relevant based on the results of your Elusion Test. Where a Margin of Error may give you a rate of 5% ± 7%, a confidence level would give you 5% + 7% − 2% as your upper and lower bounds.

This means that in a discard pile of 10,000 documents instead of looking at a possible elusion rate of 0 → 1200 documents, you are more realistically looking at 300 → 1200 documents. Understanding your lower bound enables more accurate decisions to be made about the continuation of your review. With this information in hand, all you have to do now is to decide whether it is worth continuing your review to find these extra documents that may potentially be relevant, or whether to end your review at this point, saving time and money.

For more great content on how Artificial Intelligence can be used to benefit you in your eDiscovery projects click here - Artificial Intelligence in eDiscovery!

New call-to-action