Skip to content

Simplifying the statistics behind the Elusion Test

| Written by Altlaw

The Elusion Test is a validatory test run at the stabilisation point of a Technology Assisted Review (or TAR), that estimates the accuracy of your active learning algorithm in identifying relevant documents. There are two types of Elusion Tests, allowing you to run tests specific to your data and desired outcomes, these tests are:

• Fixed Testing – A test sample of a specified size is created.

• Statistical Testing – A random test sample is created where the size of the sample is dependent on a given Confidence and Margin of Error.

Regardless of which test you choose, the confidence level and margin of error are crucial statistics to understand in order to properly interpret the results of your Elusion Test, yet they are not entirely clear in their meaning. To understand what confidence level and margin of error refer to in the case of an Elusion Test, we must first understand that the Elusion Test is run on a sample of documents taken from the ’discard pile’ created by the active learning algorithm. The discard pile is made up of all the documents the algorithm has deemed to have a relevancy score below your desired cutoff. None of these documents has been reviewed, therefore we as reviewers have no idea what the document landscape of this pile may look like. This is where confidence level and margin of error come into play.

Screen Shot

Figure 1: A random sample of documents are taken from the discard pile to create the sample set upon which the Elusion Test is run.

Confidence Level is the percentage probability of the sample upon which the Elusion Test was run, being an accurate reflection of the entire discard pile. Simply put, if we had a discard pile of 10,000 documents and only test reviewed 100, we have only tested 1% of the document pool, therefore we are unlikely to have an accurate representation of the pool as a whole. On the other hand, if we test reviewed 5,000 documents, we have now tested 50% of the documents and are much more likely to understand the document landscape.

Note: these percentages are NOT the confidence level itself!

As an example, a 95% confidence level speaks to the 95% likelihood of the actual number of eluded documents being within the predicted range given by the Elusion Test. The higher the confidence level you need, the more documents you will have to review and the more time the Elusion Test will take.