Benchmarks

Benchmarks are organised into units. First, the benchmarked data sets are presented. The main results relate to the memory use of the final kFM-index, and the processing times of the pre-assembly in which uniquely determined paths of the de Bruijn graph are generated. Finally, the time and memory used in kFM-index construction from raw sequence data is discussed, which depends on the version of the implementation.

Benchmarking setup

Data sets used for benchmarking

The following sequence read data sets have been used for benchmarking.

Genome Sequence ref. Reads
E. coli str. K-12 MG1655 SRR001665/SRX000429 21 M×36 nt
C. elegans str. N2 SRR065390/SRX026594 68 M×100 nt
Soil sample SRR444039/SRX128885 37 M×76 nt
Simulated read data:
1 Mbp genome, 0.1% SNP
No read errors
0.1% read errors
1% read errors
300 k×100 nt

The reads are analysed without error correction, but with optional quality filtering in which nucleotides with quality score below 30 are removed.

The simulated data are based on a randomly generated 1 Mbp sequence, which is made diploid with 0.1% single nucleotide polymorphisms. Reads of length 100 nt are generated at random locations and with no read errors, 0.1% read errors to emulate reads with mild error correction, and 1% read errors to emulate uncorrected reads.

General settings and processing options

Both reads and their reverse complements are added to the graph. Unless otherwise stated, the order 22 de Bruijn subgraph is generated: i.e. k=23 where k is the k-mer length represented by an edge. Note that in the article, the edge k-mer length is specified, whereas the implementation uses the order which is the vertex string length.

The distance between stored values of the previous position array is set to 32. The size of the k-mer/in-edge buffer used for the initial kFM-index construction is set depending on available memory, desired memory usage, or desired number of blocks (for assessment of kMF-index merging).

Computers used for benchmarking

Benchmarking has been done on different computers. The one labeled Laptop I is a Dell Latitude E6320 with Intel Core i7-2620 (2 cores running 4 threads) 2.70 GHz CPU and 8 GiB RAM running Java 6 under Windows 7. The one labeled Server I is a server with roughly twice the computational speed, running 8 threads, 32 GiB RAM. The amount of RAM made available to Java was controlled using the -Xmx option.

Memory usage for the kFM-index and computational speed

This section is concerned with the memory required to store the kFM-index, and the speed with which the de Bruijn subgraph information can be utilised. The test case is pre-assembly which consists of identifying all branching or ending vertices, i.e. those which have either in-degree or out-degree different from one, and producing the uniquely determined paths between these branching/ending vertices. This pre-assembly is presently done sequentially using only one thread, although it should lend itself to parallelisation.

For each vertex, 5 bit of information is used to store the vertex data. Since 12 vertices are packed into one 64 bit Java long, the final 4 bit of the long are not used. In addition, storing every 32nd position of the previous position array requires 0.875 bit per vertex. This results in 6.2 bit per vertex. However, there is additional overhead from the arrays used to store these values and from the application itself.

kFM-index
Pre-assembly
Data setVertices (Edges)Total RAMUnique pathsTime
E. coliLaptop I
Quality filtered13.4 M (13.4 M)18 MiB427 k12.9 sec
Raw reads81.7 M (83.7 M)69 MiB8.3 M227 sec
C. elegansServer I
Quality filtered255 M (259 M)290 MiB[1]12.8 M7.4 min
Soil sampleServer I
Quality filtered2.86 G (2.87 G)2.2 GiB85.6 M1.08 hr
Simulated readsLaptop I
No read errors2.05 M (2.05 M)63580.64 sec
0.1% read errors3.26 M (3.31 M)153 k3.7 sec
1% read errors15.7 M (16.1 M)1.37 M35 sec

[1] This number includes space not freed up after pruning away 61 M final-completing vertices. Compacting the kFM-index after pruning reduced the total memory to 232 MiB.

Total used memory is measured by the Java application and includes all overhead. This is not only memory used for array pointers used for organising the data, but also the RAM used to hold the application and GUI, as well as memory in object that have been released but not yet freed up by the garbage collector. At startup, before any data has been imported, the application itself takes up 8-10 MiB of RAM, so assessing the memory consuption related to the data requires that the data takes up substantially more memory than this on order to have any kind of accuracy. For estimating the total memory of the final kFM-index including all overhead, garbage collection is called before measuring total used memory. However, there will still be overhead from the application itself, and so this memory estimate is not solely due to the kFM-index.

Time and memory consumption for kFM-index construction

The construction of the kFM-index is done by generating a sequence of in-edges from the read data. This sequence is split into blocks, typically limited in size by the available RAM. Each block is sorted, and a kFM-index constructed from it. The resulting kFM-indexes are then merged pairwise until a single kFM-index results. Since the kFM-index merge operation is relatively slow, largest possible block size is an advantage to gain speed.

Note that final-completing in-edges, i.e. those containing less than k letters, are not included in the k-mer count, but need to be included in each block prior to kFM-index construction. Hence, the number of sequence read k-mers that can be included in each block is less than the block size.

Max memory usage during kFM-index construction includes everything: the application itself, the in-edge buffer used for processing the reads into initial kFM-indexes, the kFM-indexes themselves, and memory used and released but not yet garbage collected. The max used memory may be higher than required memory since Java may delay garbage collection until more RAM is actually needed.

Version 0.7

Version 0.7 has improved kFM-index construction speed for a few different reasons. The first is some improvements in the code, including a replacement of the original sorting algorithm. Next, list sorting and kFM-index merging has been parallelised. Finally, the in-edge list is no longer limited in size by the allowed size of Java arrays: a new implementation as arrays of arrays is used, which also allows it to grow more gently.

k-mer blocks
Construction time
Data setNo of k-mers#size1 threadparallelPeak mem
E. coliLaptop I4 threads
Quality filtered198 M2250 M4.7 min2.9 min≈2 GiB
Raw reads622 M5250 M27.6 min13.4 min≈2 GiB
C. elegansServer I8 threads
Quality filtered6.0 G81 G3.7 hr79 min10.8 GiB
42 G2.1 hr44 min17 GiB
33 G1.72 hr38 min21/25 GiB
Soil sampleServer I8 threads
Quality filtered2.59 G41 G5.5 hr1.64 hr15 GiB
23 G1.55 hr?24 GiB
13.5 G42 min21 min25 GiB
Simulated readsLaptop I4 threads
No read errors47.4 M120.8 sec11.9 sec
No read errors47.4 M610 M91 sec48 sec
0.1% read errors47.4 M121.9 sec12.7 sec
0.1% read errors47.4 M610 M118 sec59 sec
1% read errors47.4 M126.6 sec16.2 sec
1% read errors47.4 M610 M304 sec142 sec

Version 0.5 (as in the published article)

The benchmarks presented in the article are based on version 0.5 of the implementation. This was an implementation that ran in a single thread, i.e. no parallelisation, and had some memory constraints in the kFM-index construction phase: the initial list of in-edges was stored in a single array, which has limited size in Java. In addition, there was still some debugging code included, which slowed it down somewhat.

k-mer blocks
kFM construction
Data setNo of k-mers#sizeTimePeak mem
E. coliLaptop I
Quality filtered198 M2250 M7.7 min≈2 GiB
Raw reads622 M5250 M38 min≈2 GiB
C. elegansServer I
Quality filtered6.0 G81 G5.4 hr12 GiB
Soil sampleServer I
Quality filtered2.59 G41 G7.1 hr14 GiB
Simulated readsLaptop I
No read errors47.4 M147 sec
No read errors47.4 M610 M109 sec
0.1% read errors47.4 M149 sec
0.1% read errors47.4 M610 M139 sec
1% read errors47.4 M164 sec
1% read errors47.4 M610 M327 sec
Last modified November 09, 2013.