Posts

Showing posts with the label latency

Background reading on the reference counting vs tracing garbage collection debate

Image
Eight years ago I answered a question on Stack Overflow about the suitability of OCaml and Haskell for soft real-time work like visualization: " for real-time applications you will also want low pause times from the garbage collector. OCaml has a nice incremental collector that results in few pauses above 30ms but, IIRC, GHC has a stop-the-world collector that incurs arbitrarily-long pauses " My personal experience has always been that RAII in C++ incurs long pauses when using non-trivial data (i.e. nested, structured, collections of collections of collections, trees, graphs and so on), non-deferred reference counting has the same problem for the same reason, tracing garbage collectors like OCaml work beautifully but there are many notoriously bad tools like Java that have given tracing garbage collection a bad name. Now that I am revisiting this issue I am surprised to find many individuals and organisations repeating exactly the same experimental tests that I did and coming...

How do reference counting and tracing garbage collection compare?

Reference counting works by counting the number of references to each object. When the reference count drops to zero the object is definitely unreachable and can be recycled. The first advantage is simplicity. The second advantage is that decrements can occur as locals fall out of scope so collection can be deterministic and predictable. The third advantage is that the asymptotic complexity of reference counting does not depend upon the total size of the heap. The first problem is that cycles keep reference counts above zero even when they are unreachable, so reference counting alone cannot handle the general case. The second problem with reference counting is that incrementing and decrementing a counter in an object every time a reference to it is copied or dropped is very computationally expensive because it incurs a cache miss even if nothing else in the object is read or written. The third problem is that multithreaded reference counting is non-deterministic because increments and ...

Why do garbage collectors pause?

GCs work by tracing reachable heap blocks starting from a set of global roots (global variables, thread stacks and CPU registers). GCs lie on a sliding scale from snapshot to on-the-fly. Snapshot GCs work from a snapshot of the global roots and heap topology. On-the-fly GCs incrementally update their interpretation of the heap as the mutators run. Wholly snapshot GCs attain high throughput because the collector runs almost entirely independently of the mutators but have high latency because taking a snapshot incurs a pause. Wholly on-the-fly GCs attain low latency because everything is done incrementally but low throughput because of the fine grained communication between the mutators and GC. In practice, all GCs lie somewhere between these two  extremes.  VCGC  is primarily a snapshot GC but it uses a write barrier to keep the collector apprised of changes to the heap topology.  Staccato  was the world's first parallel and concurrent and real-time GC but it sti...

The LMAX disruptor and Baker's Treadmill

Image
A new retail financial trading platform called LMAX recently published their work on new software technology for low-latency servers that they call the "disruptor": We have noticed a striking similiarity between their disruptor and an old garbage collection algorithm from 1992 called Baker's Treadmill : The LMAX disruptor and the story behind it are very interesting in their own right and well worth reading up on. In their system, binary messages come from the "receiver" and are distributed to the "journaller", "replicator" and "unmarshaller" in parallel and the "unmarshaller" then passes on the deserialized messages to the "consumer". The core of their idea is to accomplish all of this message passing using a single shared data structure, the disruptor, rather than using several separate concurrent queues. Baker's Treadmill is a low-latency garbage collection algorithm. Allocated blocks in the heap are li...

Can you repro this 64-bit .NET GC bug?

Image
Update: Maoni Stephens of Microsoft, the author of this new "Background" garbage collector in .NET 4, has now reproduced the bug and acknowledged that we really are seeing GC pauses lasting several minutes. Whilst developing low latency software on .NET, we have discovered a serious bug in the .NET 4 concurrent workstation garbage collector that can cause applications to hang for up to several minutes at a time. On three of our machines the following simple C# program causes the GC to leak memory until none remains and a single mammoth GC cycle kicks in, stalling the program for several minutes (!) while 11Gb of heap is recycled: static void Main(string[] args) { var q = new System.Collections.Generic.Queue (); while (true) { q.Enqueue(0); if (q.Count > 1000000) q.Dequeue(); } } You need to compile for x64 on a 64-bit Windows OS with .NET 4 and run with the default (concurrent workstation) GC using the default (interactive) latency setting. Here's...