The data structure and algorithms have been implemented in Java. The implementation has been focused on clearity and generality rather than computational speed. Although the main application is likely to be on DNA sequences, this is not assumed anywhere in the implementation.

Note that the Java executable should be executed using the Java option -Xmxmaxmem, e.g. java -Xmx4g -jar WordTable.jar, to set the maximal amount of memory Java is allowed to allocate: the default is usuall very low. Executing it with java rather than javaw, which may be the default file association of JAR files, also ensures that critical errors get reported on the console.

The Java implementation can run either on the command line or in a GUI. It can process sequence data in Fasta or Fastq formats into a kFM-index, which may then be pre-assembled or various statistics may be generated. The main computational parts of the kFM-index generation has been parallelised for increased speed.

Latest version

The latest available version of the kFM-index implementation is WordIndex version 0.8 (build 4).

For download: the Java executable JAR file, the Java source files, and documentation of the implementation.

Compared to the version discussed in the article, the implementation has been considerably improved. The following features have been added:

These improvement all influence the construction of the kFM-index from the sequence data.

Removing the memory limitation, enables processing of arbitrarily large blocks of k-mers in the first step of kFM-index construction: when the sequence data is split into blocks, kFM-indexes produced from each block, and these kFM-indexes then merged pairwise until a final kFM-index results. By increasing the block size, limited only by the available memory, the number of merge operations is reduced.

During the initial kFM-index construction from lists of in-edges (k-mers), the in-edge lists are generated from sequence data and then sorted. The sort procedure has been replaced by a faster one, and parallelised. The operation for merging two kFM-indexes has also been parallelised.

Updated benchmarks are available from the benchmarks page.

Older versions

Version 0.8 (build 4)
Java executable JAR file, source files, and documentation: Fixed a bug which caused a casting exception (in some Java versions?).
Version 0.8
Java executable JAR file, source files, and documentation: Various smaller improvements in the code, with some problems fixed, allowing a more "final" release of the parallelised version without the previous memory limitations in kFM-index construction.
Version 0.7 (build 37)
Java executable JAR file, source files: This version introduces parallelisation, and also contains a number of additional improvements in terms of kFM-index construction speed.
Version 0.5 (build 24) on which the article is based
Java executable JAR file, source files, and documentation.
Last modified August 08, 2014.