Very often, the normal distribution is used as a limiting distribution. While this is often an appropriate approximation for small to moderate deviations, it may be inappropriate for large deviations, as has been found to be the case for analyses of word counts in DNA sequences

In many cases, obtaining an exact distribution is difficult, but information on the distribution as encoded in the cumulant generating function is. In those cases, deviance residuals, which can be calculated from the cumulant generating function, or estimated from an approximation of the cumulant generating function, may provide accurate large deviation statistics.

Quite a bit of theory exists on this issue, it is too often not applied. I have applied it a few times with great success, although effective application, in particular finding a natrual way of computing and expressing the cumulant generating function, is not always straight forward.

Last modified
June 21, 2007.