University of OsloSkip primary navigation

 

University of OsloAbout The University    Academics    Student Life    Research    University Library   

The Faculty of Medicine - banner

FACULTY OF MEDICINE   -   Institute of Basic Medical Sciences

 


Q-distance

Software for measuring the distance between distribution functions

Knut P. Lehre2, Anne-Catherine WG Lehre2 and Petter Laake1

1) Institute of Basic Medical Sciences, Department of Biostatistics, University of Oslo
2) Institute of Basic Medical Sciences, Department of Anatomy at the CMBN, University of Oslo

Contents

Introduction

Manual

Download and installation instructions

Example

Introduction

This is a short user manual for the q-distance software. Feel free to send questions or comments to this e-mail address: k.p.d.lehre@medisin.uio.no.

The software calculates the distance between two cumulative distribution functions. To be precise, let X and Y be two variables with cumulative distribution functions (CDFs) F and G, respectively. As a measure of the difference we study the distance function given by

 

 

where  and are the inverses of the CDFs, respectively. The estimate of is given by

 

 

where  and are the empirical distribution functions, based on m and n observations. The confidence intervals are based on asymptotic results, and their derivations can be found in Laake et al. (1985), Biometrics, 41:515-523. The algorithm used in this program is based on the methods described there.

 

Download and installation instructions can be found here.

Start the program by running the "q-distance.exe" file.


Manual

When the program is started, a brief manual text is displayed in the data panel. This text is also available from the help menu.

1) Use the "Open file" menu to open a text file with one data point on each line. The two data sets to be compared should be separated by a line containing an asterisk (*). The data is read into the “Data” panel.

2) Use the "Analyze" menu to perform analysis. The program will suggest a window size for the calculations. After the analysis, two inverse cumulative distribution graphs will be displayed in the “Inverse cumulative graphs” panel. The first data set (before the * in the data file) is shown in green, and the second data set is shown in red. The “Distance graph” panel shows the distance between the two inverse cumulative distributions in blue, and the 95 % confidence bands in black. This curve typically needs some smoothing.

3) Use the "Smooth distance graph" menu after analysis to perform LOWESS smoothing of the distance graph. Note: The distance graph and confidence bands are smoothed individually: Use "Fit distance confidence graphs" to align the confidence bands to the distance graph if the graphs become misaligned during smoothing.

4) The "Save distance graph point data to file" menu saves the points of the distance graph +-1.96 SD to a file.

    The "Save currently displayed graph to bitmap file" menu saves an image of the graph in the selected panel to a file.

 

In addition to the functions described above, displaying the inverse cumulative distributions, there are functions to show the cumulative distributions.

The menu “Simple analysis” shows a separate window with some simple statistics for the two data sets. The first dataset is denoted “Y”, and the second set “X”. Information from this window can be selected with the cursor and copied by right clicking the mouse.


Download and installation

This software is currently only available for the Microsoft Windows operating system. A Linux version might be available in the future.

Download and unzip q-distance1.6.0.zip (MD5SUM: 9eadfcef624f827afded8f2d4b067b87) into a directory. The zip file contains 3 files. The program is started by running the file “q-distance.exe”. “Birth.txt” is an example data file. The file “winlowess.exe” is used by q-distance to perform LOWESS curve smoothing. Winlowess.exe is a stand alone LOWESS curve smoothing program made by K.P.Lehre, based on GNU GPL licensed code from the BASE - BioArray Software Environment project. In accordance with the GPL license, the source code for winlowess.exe is found in winlowess.zip.


Example

Read in the dataset birth.txt by using the "Open file" menu. These data are taken from the The Low Birth Weight Study, Hosmer, D.W. og Lemeshow, S. Applied Logistic Regression, Second Edition, 2000, John Wiley & Sons, page 24, and used with permission by John Wiley & Sons. Here we study birth weight in relation to smoking. The two groups are given as non-smokers and smokers, respectively. The data in the file consist of one column, listed first the birth weight for non-smoking mothers and then for smoking mothers, separated with an *.  Please use the "Analyze" menu to perform analysis, and, finally, the "Smooth" menu after analysis to perform smoothing. The distance function is displayed by clicking on “Distance graph”. This should display the following graph:

birth_smooth10

 

Editors: k.p.d.lehre@medisin.uio.no
Document created: 16.11.2005, certified: xx.xx.2005, last update: 06.02.2010

Get in touch with the University of Oslo