Multiple testing is very often approached by using the Bonferroni correction: i.e. if N hypotheses H1,...,HN are tested giving P-values P1,...,PN, corrected P-values are given by P*i=NPi. The Bonferroni correction works quite well if the hypotheses are not strongly correlated; however, if they are positively correlated, it tends to become very conservative.
A stronger approach is Simes' procedure. This initially tests only the overall test that all the hypotheses are true by defining PSimes=min NP(r)/r where P(1)≤...≤P(N) is the ordered list of P-values. Following the closure principle, however, it may be used as a step-down procedure as a multiple testing procedure
While it can be proved that the Bonferroni correction controls the false positive rate, that is not true for the Simes procedure. However, it tends to perform very well, particularly for positively dependent hypotheses, and has been proved for distributions that are multivariate totally positive (MTP2). Cases where it fails more dramatically tend to be highly constructed and somewhat bizarre.
I have investigated some general properties of Simes' procedure. It is valid at the α level if P[PSimes≤α]/α ≤ 1. While there are special cases where it fails quite dramatically, I have shown that this can only be the case at particular significance levels: for no distribution can Simes' procedure fail in general. To be more precise, I have shown that ∫t=u..v et·P[PSimes≤e-t]-1 dt ≤ 2+2log N for all 0≤u<v, from which it follows that limsupT→∞ ∫0≤t≤T et·P[PSimes≤e-t] dt/T ≤ 1 as T approaches infinity.
Though this does not prove Simes' procedure, which is of course impossible since it is not true, it puts a very strong restriction on how badly it may be expected to perform.
These results have now been published in Biometrika: see reference and link to article below.
Bioinformatics contains a lot of multiple hypotheses testing where the hypotheses are strongly correlated. Thus, use of Simes' procedure could be a large improvement over the Bonferroni correction.
In a paper in progress, I am arguing the use of Simes P-values and a variety of E-values based on Simes' procedure for use in large database searches such as sequence homology searches.