Posts

Showing posts with the label benchmark

Benchmarking in the web age

The TechEmpower website contains some fascinating benchmarks of servers. The results on this benchmark of multiple requests to servers provide some insight into the performance characteristics of .NET on a modern problem. Specifically, the C# on ASP.NET Core solutions range from 2.5-80× slower than fastest solution which is written in Rust. In fact, C# is beaten by the following programming languages in order: Rust Java Kotlin Go C Perl Clojure PHP C++ Furthermore, .NET Core is Microsoft's new improved and faster version of .NET aimed specifically at these kinds of tasks. So why is it beaten by all those languages? I suspect that a large part of this is the change in workload from the kind of number crunching .NET was designed for to a modern string-heavy workload and I suspect .NET's GC isn't as optimised for this as the JVM is. As we have found, .NET has really poor support for JSON compared to other languages and frameworks, with support fragmented across many non-stand...

Naïve Parallelization: C++ vs Haskell

Image
A member of the Haskell community recently published a blog article revisiting our ray tracer language comparison, claiming to address the question of how naïve parallelizations in these two languages compare. The objective was to make only minimal changes to the programs in order to parallelize them and then compare performance. Our attempts to verify those results turned up a lot of interesting information. Firstly, the Haskell program that was supposedly naïvely parallelized was not the original but, in fact, a complete rewrite. This raises the question of whether or not the rewrite was specifically designed to be amenable to parallelization and, therefore, is not representative of naïve parallelization at all. The C++ used was the original with minimal changes to parallelize a single loop. Secondly, although the serial benchmark results covered a spectrum of inputs, the parallel results covered only a single case and retrospectively identified the optimal results without alluding ...

HLVM on the ray tracer language comparison

Image
We recently completed a translation of our famous ray tracer language comparison to HLVM . The translation is equivalent to the most highly optimized implementations written in other languages and this allows us to compare HLVM with a variety of competing languages for the first time. The results are astonishing. Running the benchmark with the default settings (level=9, n=5 to render 87,381 spheres at 512×512) on 32-bit x86 gives the following times for different languages: These results show that HLVM already provides competitive performance for a non-trivial benchmark. HLVM took 6.7s whereas C++ (compiled with g++ 4.3.3) took only 4.3s and Haskell (compiled with GHC 6.12) took 13.9s. However, cranking up the level parameter to 12 in order to increase the complexity of the scene, rendering a whopping 5,592,405 spheres, we find that HLVM blows away the other garbage collected languages and is even able to keep up with C++: This remarkable result is a consequence of HLVM's space-ef...

High-performance parallelism with HLVM

Image
Our open source HLVM project recently reached a major milestone with new support for high-performance shared-memory parallel programming. This is a major advance because it places HLVM among the world's fastest high-level language implementations. The previous HLVM implementation had demonstrated the proof of concept using a stop-the-world mark-sweep garbage collector. This new release optimizes that robust foundation by carrying thread-local data in registers in order to provide excellent performance. The following benchmark results show HLVM beating OCaml on many serial benchmarks on x86 despite the overheads of HLVM's new multicore-capable garbage collector: HLVM is over 2× faster than OCaml on average over these results and 4× faster on the floating-point Fibonacci function.

OCaml vs F#: QR decomposition

Recent articles in the OCaml and F#.NET Journals derived high-level implementations of QR decomposition via Householder reductions. This numerical method has many applications, most notably in the computation of best fit parameters of linear sums. Imperfections in OCaml The OCaml programming language allows this algorithm to be expressed very elegantly in only 15 lines of code: # let qr a = let m, n = Matrix.dim a in let rec qr_aux k q r qa = if k = n then q, Ma...