Geometrical analysis


Directions (one sample)

Typical applicationAssumptionsData needed
Displaying and testing for random distribution of directional data See below One column of directional (0-360) or orientational (0-180) data in degrees

Plots a rose diagram (polar histogram) of directions given in a column of degree values (0 to 360). Used for plotting current-oriented specimens, orientations of trackways, orientations of morphological features (e.g. terrace lines), etc.

By default, the 'mathematical' angle convention of anticlockwise from east is chosen. If you use the 'geographical' convention of clockwise from north, tick the box.

You can also choose whether to have the abundances proportional to radius in the rose diagram, or proportional to area (equal area).

The "Kernel density" option plots a circular kernel density estimate.

The mean angle takes circularity into account. The 95 percent confidence interval on the mean is estimated according to Fisher (1983). It assumes circular normal distribution, and is not accurate for very large variances (confidence interval larger than 45 degrees) or small sample sizes. The bootstrapped 95% confidence interval on the mean uses 5000 bootstrap replicates. The graphic uses the bootstrapped confidence interval.

The R value (Rayleigh's spread) is given by:

R is further tested against a random distribution using Rayleigh's test for directional data (Davis 1986). Note that this procedure assumes evenly or unimodally distributed data - the test is not appropriate for bidirectional data. The p values are computed using an approximation given by Mardia (1972).

The Rao's spacing test for uniform distribution uses probability tables published by Russell & Levitin (1996). A Chi-square test for uniform distribution is also available, with a user-defined number of bins (default 4).

The 'Orientations' option allows analysis of linear orientations (0-180 degrees). The Rayleigh test is then carried out by a directional test on doubled angles (this trick is described by Davis 1986); the Chi-square uses four bins from 0-180 degrees; the rose diagram mirrors the histogram around the origin.

Directions (two samples)

Typical applicationAssumptionsData needed
Testing for equal mean angle in two directional or orientational samples Concentration value kappa >1.0, unimodal (von Mises) distribution, similar R values. Two columns of directional (0-360) or orientational (0-180) data in degrees

Watson-Williams test for equal mean angle in two samples. With corrections due to Mardia (1972). The concentration parameter kappa is maximum-likelihood, computed analytically. It should be larger than 1.0 for accurate testing. In addition, the test assumes similar angular variances (R values).

Circular correlation

Typical applicationAssumptionsData needed
Testing for correlation between two directional or orientational variates "Large N" Two columns of directional (0-360) or orientational (0-180) data in degrees

This module uses the circular correlation procedure and parametric significance test of Jammalamadaka & Sengupta (2001).

Nearest neighbour point pattern analysis

Typical applicationAssumptionsData needed
Testing for clustering or overdispersion of two-dimensional position values Elements small compared to their distances, mainly convex domain, N>50. Two columns of x/y positions

Point distribution statistics using nearest neighbour analysis (modified from Davis 1986). The area is estimated either by the smallest enclosing rectangle or using the convex hull, which is the smallest convex polygon enclosing the points. Both are inappropriate for points in very concave domains. Two different edge effect adjustment methods are available: wrap-around ("torus") and Donnelly's correction.

The probability that the distribution is random (Poisson process, giving an exponential nearest neighbour distribution) is presented, together with the R value:

where d is the observed mean distance between nearest neighbours, A is the area of the convex hull, and N is the number of points. Clustered points give R<1, Poisson patterns give R~1, while overdispersed points give R>1.

The orientations (0-180 degrees) and lengths of lines between nearest neighbours, are also included. The orientations can be subjected to directional analysis to test whether the points are organised along lineaments.

Applications of this module include spatial ecology (are in-situ brachiopods clustered) and morphology (are trilobite tubercles overdispersed).

Ripley's K point pattern analysis

Typical applicationAssumptionsData needed
Testing for clustering or overdispersion of two-dimensional position values Rectangular domain. Two columns of x/y positions

Ripley's K (Ripley 1979) is the average point density as a function of distance from every point. It is useful when point pattern characteristics change with scale, e.g. overdispersion over small distances but clustering over large distances.

For complete spatial randomness (CSR), R(d) is expected to increase as the square of distance. The L(d) function is the square root of R(d)/pi. For CSR, L(d)=d, and L(d)-d=0. An approximate 95% confidence interval for L(d)-d under CSR is given by 1.42sqrt(A)/N. Ripley's edge correction is included.

Area

For the correct calculation of Ripley's K, the area must be known. In the first run, the area is computed using the smallest bounding rectangle, but this can often overestimate the real area, so the area can then be adjusted by the user. An overestimated area will typically show up as a strong overall linear trend with positive slope for L(d)-d.

Fractal dimension

The fractal dimension (if any) can be estimated as the asymptotic linear slope in a log-log plot of R(d). For CSR, the log-log slope should be 2.0. Fractals should have slopes less than 2.

Multivariate allometry

Typical applicationAssumptionsData needed
Finding and testing for allometry in a multivariate morphometric data set None A multivariate data set with variables (distance measurements) in columns, specimens in rows.

This advanced method for investigating allometry in a multivariate data set is based on Jolicoeur (1963) with extensions by Kowalewski et al. (1997). The data are (automatically) log-transformed and subjected to PCA. The first principal component (PC1) is then regarded as a size axis (this is only valid if the variation accounted for by PC1 is large, say more than 80%). The allometric coefficient for each original variable is estimated by dividing the PC1 loading for that variable by the mean PC1 loading over all variables.

95% confidence intervals for the allometric coefficients are estimated by bootstrapping specimens. 2000 bootstrap replicates are made.

Missing data is supported by column average substitution.

Fourier shape analysis

Typical applicationAssumptionsData needed
Analysis of fossil outline shape (2D) Shape expressible in polar coordinates, sufficient number of digitized points to capture featues. Digitized x/y coordinates around an outline. Specimens in rows, coordinates of alternating x and y values in columns (see Procrustes fitting below).

Accepts X-Y coordinates digitized around an outline. More than one shape (row) can be simultaneously analyzed. Points do not need to be totally evenly spaced. The shape must be expressible as a unique function in polar co-ordinates, that is, any straight line radiating from the centre of the shape must cross the outline only once.

The algorithm follows Davis (1986). The origin for the polar coordinate system is found by numerical approximation to the centroid. 128 points are then produced at equal angular increments around the outline, through linear interpolation. The centroid is then re-computed, and the radii normalized (size is thus removed from the analysis). The cosine and sine components are given for the first twenty harmonics, but note that only N/2 harmonics are 'valid', where N is the number of digitized points. The coefficients can be copied to the main spreadsheet for further analysis (e.g. by PCA).

The 'Shape view' window allows graphical viewing of the Fourier shape approximation(s).

Elliptic Fourier shape analysis

Typical applicationAssumptionsData needed
Analysis of fossil outline shape (2D) Sufficient number of digitized points to capture featues. Digitized x/y coordinates around an outline. Specimens in rows, coordinates of alternating x and y values in columns (see Procrustes fitting below).

More than one shape (row) can be simultaneously analyzed.

Elliptic Fourier shape analysis is in some respects superior to simple Fourier shape analysis. One advantage is that the algorithm can handle complicated shapes which may not be expressible as a unique function in polar co-ordinates. Elliptic Fourier shapes is now a standard method of outline analysis. The algorithm used in PAST is described in Ferson et al. 1985.

Cosine and sine components of x and y increments along the outline for the first 30 harmonics are given, but only the first N/2 harmonics should be used, where N is the number of digitized points. Size and positional translation are normalized away, and do not enter in the coefficients. However, no attempt is made to standardize rotation or starting point, so all specimens should be measured in a standard orientation. The coefficients can be copied to the main spreadsheet for further analysis (e.g. by PCA).

The 'Shape view' window allows graphical viewing of the elliptic Fourier shape approximation(s).

Eigenshape analysis (2D)

Typical applicationAssumptionsData needed
Analysis of fossil outline shape (2D) Sufficient number of digitized points to capture featues. Digitized x/y coordinates around an outline. Specimens in rows, coordinates of alternating x and y values in columns (see Procrustes fitting below).

Eigenshapes are principal components of outlines. The scatter plot of outlines in principal component space can be shown, and linear combinations of the eigenshapes themselves can be visualized.

The implementation in PAST is partly based on MacLeod (1999). It finds the optimal number of equally spaced points around the outline using an iterative search, so the original points need not be equally spaced. The eigenanalysis is based on the covariance matrix of the non-normalized turning angle increments around the outlines. The algorithm does not assume a closed curve, and the endpoints are therefore not constrained to coincide in the reconstructed shapes. Landmark-registered eigenshape analysis is not included. All outlines must start at the 'same' point.

Procrustes and Bookstein fitting (2D or 3D)

Typical applicationAssumptionsData needed
Standardization of morphometrical landmark coordinates None Digitized x/y or x/y/z landmark coordinates. Specimens in rows, coordinates with alternating x and y (or x/y/z) values in columns.

The Procrustes option in the Transform menu will transform your measured coordinates to Procrustes coordinates. There is also a menu choice for Bookstein coordinates. Specimens go in different rows and landmarks along each row. If you have three specimens with four landmarks in 2D, your data should look as follows:

x1y1x2y2x3y3x4y4
x1y1x2y2x3y3x4y4
x1y1x2y2x3y3x4y4

For 3D the data will be similar, but with additional columns for z.

Landmark data in this format could be analyzed directly with the multivariate methods in PAST, but it is recommended to standardize to Procrustes coordinates by removing position, size and rotation. A further transformation to Procrustes residuals (approximate tangent space coordinates) is achieved by selecting 'Subtract mean' in the Edit menu. Note: You must always convert to Procrustes coordinates first, then to Procrustes residuals.

Here is a typical sequence of operations for landmark analysis:

A thorough description of Procrustes and tangent space coordinates is given by Dryden & Mardia (1998). The algorithms for Procrustes fitting are from Rohlf & Slice (1990) (2D) and Dryden & Mardia (1998) (3D).

Bookstein fitting has a similar function as Procrustes fitting, but simply standardizes size, rotation and scale by forcing the two first landmarks onto the coordinates (0,0) and (1,0). It is not in common use today. Bookstein fitting is only implemented for 2D.

Missing data is supported by column average substitution.

Shape PCA

This is an option in the Principal Components module (Multivar menu). PCA on landmark data can be carried out as normal PCA analysis on Procrustes coordinates for 2D or 3D (see above), but for 2D landmark data some extra functionality is available in the PCA module by choosing Shape PCA. The var-covar option is enforced, and the 'Shape deform (2D)' button enabled. This allows you to view the displacement of landmarks from the mean shape (plotted as points or symbols) in the direction of the different principal components, allowing interpretation of the components. The displacements are plotted as lines (vectors).

Grid

The "Grid" option visualizes the deformations as thin-plate splines. These splines are the relative warps with the uniform (affine) component included, and with alpha=0. Relative warps are also available separately in the "Geomet" menu, but there the uniform component is not included.

Thin-plate spline transformation grids

Typical applicationAssumptionsData needed
Visualization of shape change None Digitized x/y landmark coordinates. Specimens in rows, coordinates of alternating x and y values in columns. Procrustes standardization recommended.

The first specimen (first row) is taken as a reference, with an associated square grid. The warps from this to all other specimens can be viewed. You can also choose the mean shape as the reference.

The 'Expansion factors' option will display the area expansion (or contraction) factor around each landmark in yellow numbers, indicating the degree of local growth. This is computed using the Jacobian of the warp. Also, the expansions are colour-coded for all grid elements, with green for expansion and purple for contraction.

At each landmark, the principal strains can also be shown, with the major strain in black and minor strain in brown. These vectors indicate directional stretching.

A description of thin-plate spline transformation grids is given by Dryden & Mardia (1998).

Partial warps

From the thin-plate spline window, you can choose to see the partial warps for a particular spline deformation. The first partial warp will represent some long-range (large scale) deformation of the grid, while higher-order warps will normally be connected with more local deformations. The affine component of the warp (also known as zeroth warp) represents linear translation, scaling, rotation and shearing. In the present version of PAST you can not view the principal warps.

When you increase the amplitude factor from zero, the original landmark configuration and a grid will be progressively deformed according to the selected partial warp.

Partial warp scores

From the thin-plate spline window, you can also choose to see the partial warp scores of all the specimens. Each partial warp score has two components (x and y), and the scores are therefore presented in scatter plots.

Relative warps

Typical applicationAssumptionsData needed
Ordination of a set of shapes None Digitized x/y landmark coordinates. Specimens in rows, coordinates of alternating x and y values in columns. Procrustes standardization recommended.

The relative warps can be viewed as the principal components of the set of thin-plate transformations from the mean shape to each of the shapes under study. It provides an alternative to direct PCA of the landmarks (see Shape PCA above).

The parameter alpha can be set to one of three values:

The relative warps are ordered according to importance, and the first and second warps are usually the most informative. Note that the percentage values of the eigenvalues are relative to the total non-affine part of the transformation - the affine part is not included (see Shape PCA for relative warps with the affine component included).

The relative warps are visualized with thin-plate spline transformation grids. When you increase or decrease the amplitude factor away from zero, the original landmark configuration and grid will be progressively deformed according to the selected relative warp.

The relative warp scores of pairs of consecutive relative warps can shown in scatter plots, and all scores can be shown in a numerical matrix.

The algorithm for computing the relative warps is taken from Dryden & Mardia (1998).

Size from landmarks (2D or 3D)

Typical applicationAssumptionsData needed
Size estimation from landmarks None Digitized x/y or x/y/z landmark coordinates. Specimens in rows, coordinates with alternating x and y (and z for 3D) values in columns. Must not be Procrustes fitted or normalized for size!

Calculates the centroid size for each specimen (Euclidean norm of the distances from all landmarks to the centroid).

The values in the 'Normalized' column are centroid sizes divided by the square root of the number of landmarks - this might be useful for comparing specimens with different numbers of landmarks.

Normalize size

The 'Normalize size' option in the Transform menu allows you to remove size by dividing all coordinate values by the centroid size for each specimen. For 2D data you may instead use Procrustes coordinates, which are also normalized with respect to size.

See Dryden & Mardia (1998), p. 23-26.

Distance from landmarks (2D or 3D)

Typical applicationAssumptionsData needed
Calculating distances between two landmarks None Digitized x/y or x/y/z landmark coordinates. Specimens in rows, coordinates with alternating x and y (and z for 3D) values in columns. May or may not be Procrustes fitted or normalized for size.

Calculates the Euclidean distances between two fixed landmarks for one or many specimens. You must choose two landmarks - these are named according to the name of the first column for the landmark (x value).

All distances from landmarks (EDMA)

Typical applicationAssumptionsData needed
Finding distances between all pairs of landmarks None Digitized x/y or x/y/z landmark coordinates. Specimens in rows, coordinates with alternating x and y (and z for 3D) values in columns. May or may not be Procrustes fitted or normalized for size.

This function will replace the landmark data in the data matrix with a data set consisting of distances between all pairs of landmarks, with one specimen per row. The number of pairs is N(N-1)/2 for N landmarks. This transformation will allow multivariate analysis of distance data, which are not sensitive to rotation or translation of the original specimens, so a Procrustes fitting is not mandatory before such analysis. Using distance data also allows log-transformation, and analysis of fit to the allometric equation for pairs of distances.

Missing data is supported by column average substitution.

Landmark linking

This function in the Geomet menu allows the selection of any pairs of landmarks to be linked with lines in the morphometric plots (thin-plate splines, partial and relative warps, etc.), to improve readability. The landmarks must be present in the main spreadsheet before links can be defined.

Pairs of landmarks are selected or deselected by clicking in the symmetric matrix. The set of links can also be saved in a text file. Note that there is little error checking in this module.

Burnaby size removal

This function in the Transform menu will project your multivariate data set of measured distances onto a space orthogonal to the first principal component. Burnaby's method may (or may not!) remove isometric size from the data, for further "size-free" data analysis. The "Allometric" option will log-transform the data prior to projection, thus conceivably removing also allometric size-dependent shape variation from the data. Note that the implementation in PAST does not center the data within groups - it assumes that all specimens (rows) belong to one group.

Gridding (spatial interpolation)

Typical applicationAssumptionsData needed
Spatial interpolation of scattered data points onto a regular grid Some degree of smoothness Three columns with position (x,y) and corresponding data values

Gridding (spatial interpolation) allows the production of a map showing a continuous spatial estimate of some variate such as fossil abundance or thickness of a rock unit, based on scattered data points. The user can specify the size of the grid (number of rows and columns), but in the present version the spatial coverage of the map is generated automatically based on the positions of data points (the map will always be square).

A least-squares linear surface (trend) is automatically fitted to the data, removed prior to gridding and finally added back in. This is primarily useful for the semivariogram modelling and the kriging method.

Three algorithms are available:

Moving average
The value at a grid node is simply the average of the N closest data points, as specified by the user (the default is to use all data points). The points are given weight in inverse proportion to distance. This algorithm is simple and will not always give good (smooth) results. One advantage is that the interpolated values will never go outside the range of the data points.

Thin-plate spline
Maximally smooth interpolator. Can overshoot in the presence of sharp bends in the surface.

Kriging
The user is required to specify a model for the semivariogram, by choosing one of three models (spherical, exponential or Gaussian) and corresponding parameters to fit the empirical semivariances as well as possible. The semivariogram is computed within each of a number of bins. Using the histogram option, choose a number of bins so that each bin (except possibly the rightmost ones) contains at least 30 distances. See e.g. Davis (1986) for more information.

The kriging procedure also provides an estimate of standard errors across the map (this depends on the semivariogram model being accurate). Kriging in PAST does not provide for anisotropic semivariance.

Next: Cladistics PAST home page