Posts

Showing posts with the label numerical

Performance: OCaml vs HLVM beta 0.4

Image
A quick update due to a surprise performance result for HLVM ! We knew that manipulating the shadow stack was a major slowdown in HLVM from the change in our benchmark results when the GC was first introduced a few weeks ago but we did not expect that a simple local optimization, unrolling, could go so far towards recovering performance. Moreover, this optimization was implemented in only a few lines of OCaml code. The following results prove that HLVM now produces substantially faster programs than ocamlopt for numerical tasks on x86: Interestingly, now that HLVM supports standalone compilation using LLVM's IR optimization passes as well as unoptimized JIT compilation, we see that LLVM's optimization passes only give small performance improvements in many cases and even substantially degrade performance on our 10-queens benchmark. This is a direct result of HLVM producing near optimal code directly (which greatly reduces compile times) and has eliminated our desire to add sup...

Performance: OCaml vs HLVM beta 0.3

Image
Our HLVM project is evolving rapidly and recently saw its first release with a working garbage collector. This has allowed us to make some performance measurements and, as we had hoped, the results are extremely compelling. The following graph illustrates the time taken to perform each of the benchmarks in the HLVM test suite on one of the 2.1GHz Opteron 2352 cores in an 8-core Dell PowerEdge: The individual benchmarks are: fib : The usual Fibonacci function. ffib : A floating-point version of the Fibonacci function. sieve : Sieve of Eratosthenes to find all of the primes under 10^8 and print the last one using a byte array. mandel : Mandelbrot rendering with float arithmetic. mandel2 : Mandelbrot rendering with abstracted complex arithmetic where complex numbers are pairs of floating point numbers. Array.fold : Initialize a 10^8-element float array and fold of a float * float pair over it. List.init/fold : Initialize a 10^6-element int list and fold over it (allocation intensive). 1...