Diversity

Diversity statistics

 Typical application Assumptions Data needed Quantifying alpha diversity in samples Representative samples One or more columns, each containing counts of individuals of different taxa down the rows

These statistics apply to association data, where number of individuals are tabulated in rows (taxa) and possibly several columns (associations). The available statistics are as follows, for each association:

• Number of taxa (S)

• Total number of individuals (n)

• Dominance=1-Simpson index. Ranges from 0 (all taxa are equally present) to 1 (one taxon dominates the community completely).
D=sum((ni/n)2) where ni is number of individuals of taxon i.

• Simpson index=1-dominance. Measures 'evenness' of the community from 0 to 1. Note the confusion in the literature: Dominance and Simpson indices are often interchanged!

• Shannon index (entropy). A diversity index, taking into account the number of individuals as well as number of taxa. Varies from 0 for communities with only a single taxon to high values for communities with many taxa, each with few individuals.
H=-sum((ni/n)ln(ni/n))

• Buzas and Gibson's evenness = eH/S

• Menhinick's richness index - the ratio of the number of taxa to the square root of sample size.

• Margalef's richness index: (S-1)/ln(n), where S is the number of taxa, and n is the number of individuals.

• Equitability. Shannon diversity divided by the logarithm of number of taxa. This measures the evenness with which individuals are divided among the taxa present.

• Fisher's alpha - a diversity index, defined implicitly by the formula S=a*ln(1+n/a) where S is number of taxa, n is number of individuals and a is the Fisher's alpha.

• Berger-Parker dominance: simply the number of individuals in the dominant taxon relative to n.

Many of these indices are explained in Harper (1999).

Approximate confidence intervals for all these indices can be computed with a bootstrap procedure. 1000 random samples are produced (200 prior to version 0.87b), each with the same total number of individuals as in each original sample. The random samples are taken from the total, pooled data set (all columns). For each individual in the random sample, the taxon is chosen with probabilities according to the original, pooled abundances. A 95 percent confidence interval is then calculated. Note that the diversity in the replicates will often be less than, and never larger than, the pooled diversity in the total data set.

Since these confidence intervals are all computed with respect to the pooled data set, they do not represent confidence intervals for the individual samples. They are mainly useful for identifying samples where the given diversity index falls outside the confidence interval. Bootstrapped comparison of diversity indices in two samples is provided in the Compare diversities module.

 Typical application Assumptions Data needed Estimating species richness from several quadrat samples Representative, random quadrats of equal size Two or more columns, each containing presence/absence (1/0) of different taxa down the rows

Four non-parametric species richness estimators are included in PAST: Chao 2, first- and second-order jackknife, and bootstrap. All of these require presence-absence data in two or more sampled quadrats of equal size. Colwell & Coddington (1994) reviewed these estimators, and found that the Chao2 and the second-order jackknife performed best.

Beta diversity

 Typical application Assumptions Data needed Quantifying overall beta diversity in a set of samples Representative samples Two or more rows (samples) of presence-absence (0/1) data, with taxa in columns

The beta diversity module in Past can be used for any number of samples (not limited to only two samples). The eight measures available are described in Koleff et al. (2003), and the table below refers to their notation:

 Past Koleff et al. Whittaker bw Harrison b-1 Cody bc Routledge bI Wilson-Shmida bt Mourelle bme Harrison 2 b-2 Williams b-3

Taxonomic distinctness

 Typical application Assumptions Data needed Quantifying taxonomical distinctness in samples Representative samples One or more columns, each containing counts of individuals of different taxa down the rows. In addition, the leftmost row(s) must contain names of genera/families etc. (see below).

Taxonomic diversity and taxonomic distinctness as defined by Clarke & Warwick (1998), including confidence intervals computed from 200 random replicates taken from the pooled data set (all columns). Note that the "global list" of Clarke & Warwick is not entered directly, but is calculated internally by pooling (summing) the given samples.

These indices depend on taxonomic information also above the species level, which has to be entered for each species as follows. Species names go in the name column (leftmost, fixed column), genus names in column 1, family in column 2 etc. Species counts follow in the columns thereafter. The program will ask for the number of columns containing taxonomic information above the species level.

For presence-absence data, taxonomic diversity and distinctness will be valid but equal to each other.

Individual rarefaction

 Typical application Assumptions Data needed Comparing taxonomical diversity in samples of different sizes When comparing samples: Samples should be taxonomically similar, obtained using standardised sampling and taken from similar 'habitat'. One or more columns of counts of individuals of different taxa (each column must have the same number of values)

Given one or more columns of abundance data for a number of taxa, this module estimates how many taxa you would expect to find in a sample with a smaller total number of individuals. With this method, you can compare the number of taxa in samples of different size. Using rarefaction analysis on your largest sample, you can read out the number of expected taxa for any smaller sample size (including that of the smallest sample). The algorithm is from Krebs (1989). An example application in paleontology can be found in Adrain et al. (2000).

Let N be the total number of individuals in the sample, s the total number of species, and Ni the number of individuals of species number i. The expected number of species E(Sn) in a sample of size n and the variance V(Sn) are then given by

Standard errors (square roots of variances) are given by the program. In the graphical plot, these standard errors are converted to 95 percent confidence intervals.

Sample rarefaction (Mao tau)

 Typical application Assumptions Data needed Computing species accumulation curves as a function of number of samples Similar to individual-based rarefaction A matrix of presence-absence data (abundances treated as presences), with taxa in rows and samples in columns.

Sample-based rarefaction (also known as the species accumulation curve) is applicable when a number of samples are available, from which species richness is to be estimated as a function of number of samples. PAST implements the analytical solution known as "Mao tau", with standard deviation. In the graphical plot, the standard errors are converted to 95 percent confidence intervals.

See Colwell et al. (2004) for details.

Diversity curves

 Typical application Assumptions Data needed Plotting diversity curves from occurrence data None Abundance or presence/absence matrix with samples in rows (lowest sample at bottom) and taxa in columns

Found in the 'Strat' menu, this simple tool allows plotting of diversity curves from occurrence data in a stratigraphical column. Note that samples should be in stratigraphical order, with the uppermost (youngest) sample in the uppermost row. Data are subjected to the range-through assumption (absences between first and last appearance are treated as presences). Originations and extinctions are in absolute numbers, not percentages.

The 'Endpoint correction' option counts a FAD or LAD in a sample as 0.5 instead of 1 in that sample. Both FAD and LAD in the sample counts as 0.33.

Compare diversities

 Typical application Assumptions Data needed Comparing diversities in two samples of abundance data Equal sampling conditions Two columns of abundance data with taxa down the rows

This module computes a number of diversity indices for two samples, and then compares the diversities using two different randomisation procedures as follows.

Bootstrapping
The two samples A and B are pooled. 1000 random pairs of samples (Ai,Bi) are then taken from this pool, with the same numbers of individuals as in the original two samples. For each replicate pair, the diversity indices div(Ai) and div(Bi) are computed. The number of times |div(Ai)-div(Bi)| exceeds or equals |div(A)-div(B)| indicates the probability that the observed difference could have occurred by random sampling from one parent population as estimated by the pooled sample.

A small probability value p(same) then indicates a significant difference in diversity index between the two samples.

Permutation
1000 random matrices with two columns (samples) are generated, each with the same row and column totals as in the original data matrix. The p value is computed as for the boostrap test.

Diversity t test

 Typical application Assumptions Data needed Comparing Shannon diversities in two samples of abundance data Equal sampling conditions Two columns of abundance data with taxa down the rows

Comparison of the Shannon diversities (entropies) in two samples, using a t test described by Poole (1974). This is an alternative to the randomization test available in the Compare diversities module.

Note that the Shannon indices here include a bias correction term (Poole 1974), and may diverge slightly from the uncorrected estimates calculated elsewhere in PAST, at least for small samples.

Diversity profiles

 Typical application Assumptions Data needed Comparing diversities in two samples of abundance data Equal sampling conditions Two columns of abundance data with taxa down the rows

The validity of comparing diversities in two samples can be criticized because of arbitrary choice of diversity index. One sample may for example contain a larger number of taxa, while the other has a larger Shannon index. It may therefore be a good idea to try a number of diversity indices in order to make sure that the diversity ordering is robust. A formal way of doing this is to define a family of diversity indices, dependent upon a single continuous parameter (Tothmeresz 1995).

PAST uses the exponential of the so-called Renyi index, which depends upon a parameter alpha. For alpha=0, this function gives the total species number. alpha=1 gives an index proportional to the Shannon index, while alpha=2 gives an index which behaves like the Simpson index.

The program plots two such diversity profiles together. If the profiles cross, the diversities are non-comparable.