# Simes' multiple testing procedure

Multiple testing is very often approached by using the Bonferroni
correction: i.e. if `N` hypotheses `H`_{1},...,H_{N} are tested giving
P-values `P`_{1},...,P_{N},
corrected P-values are given by `P`^{*}_{i}=NP_{i}. The
Bonferroni correction works quite well if the hypotheses are not
strongly correlated; however, if they are positively correlated, it
tends to become very conservative.

A stronger approach is Simes' procedure. This initially tests only
the overall test that all the hypotheses are true by defining `P`^{Simes}=min NP^{(r)}/r where `P`^{(1)}≤...≤P^{(N)} is the
ordered list of P-values. Following the closure principle, however, it
may be used as a step-down procedure as a multiple testing
procedure

While it can be proved that the Bonferroni correction controls the
false positive rate, that is not true for the Simes
procedure. However, it tends to perform very well, particularly for
positively dependent hypotheses, and has been proved for distributions
that are multivariate totally positive (MTP_{2}). Cases where
it fails more dramatically tend to be highly constructed and somewhat
bizarre.

## Average validity of Simes' procedure

I have investigated some general properties of Simes'
procedure. It is valid at the `α` level if
`P[P`^{Simes}≤α]/α ≤ 1.
While there are special cases where it fails quite
dramatically, I have shown that this can only be the case at
particular significance levels: for no distribution can Simes'
procedure fail in general. To be more precise, I have shown that
`∫`_{t=u..v}
e^{t}·P[P^{Simes}≤e^{-t}]-1 dt
≤ 2+2log N
for all `0≤u<v`, from which it follows that
`limsup`_{T→∞}
∫_{0≤t≤T}
e^{t}·P[P^{Simes}≤e^{-t}] dt/T
≤ 1 as `T` approaches infinity.

Though this does not prove Simes' procedure, which is of course
impossible since it is not true, it puts a very strong restriction on
how badly it may be expected to perform.

These results have now been published in
Biometrika: see reference and link to article below.

## Use of Simes' procedure in bioinformatics

Bioinformatics contains a lot of multiple hypotheses testing where
the hypotheses are strongly correlated. Thus, use of Simes' procedure
could be a large improvement over the Bonferroni correction.

In a paper in progress, I am arguing the use of Simes P-values and
a variety of E-values based on Simes' procedure for use in large
database searches such as sequence homology searches.