Titan performance

HPCC 1.4

HPCC run with 64 cpus on intel X5550 processors
HPCC run with 64 cpus on AMD 2354 processors
HPCC run with 1024 cpus on AMD 2354(976) and 2367(48) processors

Meta Data Benchmark

A simple meta data benchmark is written to exercise GPFS meta data performance. Unpack a linux kernel, count the number of filer, touch all files, remove all files. Do this concurrently for a large number of instances and record the time.
Files (save the scripts and mpi program) :
SLURM run script
MPI application to launch and time
Bash script to do it
Titan results

In addition some meta data tests using Bonnie++ are run and results are available here.

GPFS performance

IOR, Meta-data benchmark and Bonnie++ benchmark results from Titan running GPFS.
Titan GPFS performance


A paper about how to use VMware and Sun Grid Engine in HPC. Paper-VM-nodes.

Benchmarking :

All reports and spreadsheets are provided as is. The support for clarifications and questions are limited. Spreadsheets are in OpenDocument / OpenOffice format or PDF. My workstation has Norwegian language and there might be some ',' used for decimal comma in some spreadsheets.

GPU benchmarks

Using Graphic cards for computation is of interest as the cards can yield high performance. I have written a report about measured performance. In selected cases (linear algebra) very high performance can be attained. At present only single precision (32 bit) data types are supported.

MPI benchmarks

Key performance parameters for MPIs are latency and bandwidth. While parameters like collective operations performance and ability to overlap communication and computation are also impacting end user application perceived performance. Not only is the interconnect performance important, but with the ever growing number of CPUs/cores per socket also the intra node communication becomes more important. This is normally done using some kind of shared memory communication, L1, L2, (L3) cache or main memory.

Calculating benchmarks, HPL

High Performance Linpack is run on a range of different nodes using MPI on shared memory. This is usually a cluster test, but single node performance is important to assess the individual compute nodes. Todays nodes has as many CPUs as small clusters used to have some time ago. Some results are avilable.

Calculating benchmarks, HPCC

High Performance Compute Chalenge Linpack is run on Titan with different libs and MPIs. Some results are avilable. A test of Intel Quad core versus AMD quad core running SMP MPI is noe added.

Calculating benchmarks, Euroben-v 5.0

EuroBen Benchmark provides benchmark programs for scientific and technical computing to assess the performance of computers for these fields. All programs are written in Fortran 90/95. Some results are available.

Virtual compute nodes

Performance testing using benchmarks and applications of Virtual Compute nodes running under VMware Server. Virtual compute nodes can be moved around on the physical nodes in a cluster. While there is a performance degradation the possibility to suspend, migrate and resume the jobs in a low priority/free queue outweighs the loss of of performance. 70-90% performance is far better than waiting in the ever growing queue.

Local or Parallel file system scratch disk ?

How to use local scratch disk, local disk or parallel file system? What is the best option for scratch disk during a run ? Using local disk or the parallel file system and the cluster of file servers ? Also what kind of performance will the parallel file system yield at different record sizes ? How is the problem with random read addressed ?

Parallel file system and MPI-IO

How get maximum bandwidth from a parallel file system using MPI-IO. Results from the IOR benchmark using GPFS. Notice the good scaling from random read using MPI-IO.

IO performance with IOZone

IOZone benchmarking on different storage solutions.

IO performance - report

Report on IOZone benchmarking on different storage solutions. This is a working draft and not a published report.

Memory bandwidth using Stream

Stream memory bandwidth benchmarking on different servers.

Benchp1 benchmark in historical perspective

Some historical interesting benchp1 runs. Storage benchmark are unreliable as ./ might be anything and is not known for the different runs. CPU and memory benchmarks are ok.


