Posts

Showing posts with the label mark sweep

What's new in garbage collection?

Since the 1950s there have been three main families of collectors: semi-space, mark-sweep and mark-compact. Almost all production GCs have been generational mark-sweep even though they exhibit pathological performance when the nursery is full of survivors because they are marked, (physically) copied and all references to them updated which is ~3x slower than necessary and is practically useful when filling a hash table with heap-allocated keys and/or values. The Beltway (2002) and Immix (2008) garbage collectors introduced the new family called the mark-region GCs. With a mark-region GC the entire heap is a collection of regions and so is the nursery generation so it can be logically aged by replacing it with another region when it is full of survivors. Sun's Hotspot JVM introduced the first mainstream mark-region GC with its G1 collector. The advent of multicore in 2005 has meant more emphasis on parallel and concurrent garbage collectors. The Staccato (2008) garbage collector ...

Classifying garbage collection algorithms

Richard Jones' excellent new book about garbage collection has rekindled my fascination with this important subject. The Wikipedia page about GC is disappointingly poor quality so it is productive to review the main points here. GC algorithms are categorized into four main families with production garbage collectors often combining algorithms from different families. The families are copying, mark-sweep, mark-compact and reference counting. The first three are all tracing collectors that work by tracing all reachable values by starting from the global roots (global variables and locals held on the stack). Copying collectors allocate into one space and then "evacuate" the reachable heap blocks (called "survivors") into another space before clearing the space they came from. The Cheney semi-space algorithm is the simplest copying collector: it uses two spaces and copies from one to the other and back again. The advantage of copying collection is that many heap-...

High-performance parallelism with HLVM

Image
Our open source HLVM project recently reached a major milestone with new support for high-performance shared-memory parallel programming. This is a major advance because it places HLVM among the world's fastest high-level language implementations. The previous HLVM implementation had demonstrated the proof of concept using a stop-the-world mark-sweep garbage collector. This new release optimizes that robust foundation by carrying thread-local data in registers in order to provide excellent performance. The following benchmark results show HLVM beating OCaml on many serial benchmarks on x86 despite the overheads of HLVM's new multicore-capable garbage collector: HLVM is over 2× faster than OCaml on average over these results and 4× faster on the floating-point Fibonacci function.

How does HLVM's GC work?

HLVM injects code into managed user-code functions that keep two data structures up to date: The shadow stack lists all references held on the stack or in registers and acts as the set of global roots into the heap. If a function accepts an argument of a reference type or allocates then the value is pushed onto the shadow stack. On function exit, the shadow stack is reset back to the way it was when the function was entered. The allocated list gives all of the references that have been allocated by the program. When a function allocates, the value is appended onto the allocated list. When the allocation quota is exceeeded a garbage collection occurs. This is composed of two phases: Mark. Starting from the global roots on the shadow stack, all reachable data in the heap are marked as reachable (and the rest left marked as unreachable). Sweep. The allocated list is traversed, unreachable values are freed and all values have their mark state set back to unreachable. This functionality ...