Posts

Showing posts with the label hash table

Hash table insertion performance: F# vs OCaml vs Haskell

Image
Time taken to insert ten million int→T bindings for T as each of int , float , complex and string using F# on .NET 4.5, OCaml 3.12 and GHC 7.4.2: On a 3.4GHz Intel Core i7-3770 running Windows 7. Value types and reified generics facilitate .NET's very efficient Dictionary implementation when handling value types like int , float and complex (unmatched by any other language/VM including C, C++ and Java). F# is 50% faster than OCaml and Haskell using the reference type string perhaps because the .NET write barrier is significantly more efficient. Note: we use "int" to refer to ordinary 32-bit ints in F# and Haskell but 31-bit ints in OCaml. Using 32-bit ints in OCaml would be much slower because they are boxed (i.e. heap allocated).

Haskell's hash tables revisited: part 2

Our previous blog post contained several benchmark results comparing the new GHC 6.12.3 with F#. We have since discovered some other points of interest regarding this benchmark. Firstly, the results for Haskell rely on the use of a garbage collector that prevents parallelism. If the more modern multicore-friendly GC is used (by compiling with -threaded and running with +RTS -N8 ) then the time taken increases from 4.5s to 10.6s. This is over 2× slower than before and now over 13× slower than F#. Naturally, the F# was already using the multicore-capable .NET garbage collector so this was an unfair bias in favor of Haskell. Secondly, the Haskell code exploits an algorithmic optimization on the assumption that the keys are unique. This is often not the case in practice and, again, the F# code did not exploit such an assumption so this was another unfair bias in favor of Haskell. A fairer comparison may be obtained by changing from the insert function to the update function in the Hask...

Haskell's hash tables revisited

Update: We have since discovered that these results were biased towards Haskell. Mikhail Glushenkov recently announced the Haskell Platform 2010.2 RC for Windows. In particular, this is the first release to include a version of the Glasgow Haskell Compiler (6.12.3) that has the new garbage collector fix to address the performance problems Haskell programmers have been experiencing with mutable arrays of boxed values over the past 5 years, such as the spines of hash tables. Naturally, we couldn't resist benchmarking the new release to see if it lives up to the promise of decent hash table performance. Even though this installer for the Haskell Platform is just a release candidate, we found that it installed smoothly and ran correctly first time. First up, a repeat of our previous benchmark which inserted 10,000,000 bindings with int keys mapping to int values into an initially-empty hash table: GHC 6.12.1: 19.2s GHC 6.12.3: 4.48s F# .NET 4: 0.8s The new version of GHC is clear...

Haskell's hash table woes (again)

Alessandro Stamatto recently asked a question on Stack Overflow about the current state of Haskell's hash tables following our rediscovery last year that Haskell's hash tables were up to 32× slower than common alternatives like C++ and .NET and even slower than Python. Don Stewart edited Alessandro's original question , deleting the citation to the original study that provided a verifiable experiment with complete code, before writing an answer claiming that " the original citation doesn't present any code or benchmarks ". Although Don provided a variety of Haskell-based performance measurements, he omitted any measurements of decent hash table implementations like those readily available from C++ or any .NET language that are still 10× faster than anything that can be written in Haskell. Norman Ramsey's answer took this a step further by drawing the conclusion that "functional data structures perform pretty well" despite the overwhelming evid...

More on Haskell's hash table problems

We recently proved that Haskell's defacto-standard hash table implementation is extremely slow. So slow that optimized native-code compiled Haskell is even slower than interpreted Python and 32× slower than a decent hash table implementation! If that result were not remarkable enough, we have now discovered that the history of this issue is even more surprising... Haskell programmers have been complaining about the poor performance of its hash table implementation for at least 3 years . The cause of the problem, a bug in the garbage collector, was identified a long time ago but that bug was never fixed. Instead, it appears that the Haskell developers went on a crusade to persuade the world that decent hash table implementations are not advantageous. For example, the book "Real World Haskell" by Don Stewart et al. explicitly states : "Compared to a hash table, a well-implemented purely functional tree data structure will perform competitively. You should not approach...

F# vs OCaml vs Haskell: hash table performance

I just stumbled upon an interesting benchmark. Put mappings i -> i for i in 1..10M in a hash table, print the entry for 100. Don Stewart's optimized Haskell: import Control.Monad import qualified Data.HashTable as H main = do m forM_ [1..10000000] $ \n -> H.insert m n n v print v My OCaml translation: let n = 10000000 let () = let m = Hashtbl.create n in for n=1 to n do Hashtbl.add m n n done; Printf.printf "%d\n%!" (Hashtbl.find m 100) My F# translation: do let m = System.Collections.Generic.Dictionary() for i = 1 to 10000000 do m.[i] printf "%d\n" m.[100] Performance: Haskell: 22.8s OCaml: 2.82s F#: 0.706s I found it remarkable that the performance difference is so large: F# is 32× faster than the optimized Haskell here! Thanks to everyone who has replied. Apparently this is a fundamental design flaw in GHC's garbage collector which is incapable of handling the mutation of arrays of boxed values (e.g. ha...