Larchant: garbage collection in a cached distributed shared store with persistence by reachability

The model of Larchant is that of a Shared Address Space (spanning every site in a network including secondary storage) with Persistence By Reachability. To provide the illusion of a shared address space across the network, despite the fact that site memories are disjoint, Larchant implements a distributed shared memory mechanism. Reachability is accessed by tracing the pointer graph, starting from the persistent root, and reclaiming unreachable objects. This is the task of Garbage Collection (GC). GC was until recently thought to be intractable in a large-scale system, due to problems of scale, incoherence, asynchrony, and performance.

This thesis presents the solutions that Larchant proposes to these problems. The GC algorithm in Larchant combines tracing and reference-listing. It traces whenever economically feasible, i.e., as long as the memory subset being collected remains local to a site, and counts references that would cost I/O traffic to trace. GC is orthogonal to coherence, i.e., makes progress even if only incoherent replicas are locally available. The garbage collector runs concurrently and asynchronously to applications. The reference-listing boundary changes dynamically and seamlessly, and independently at each site, in order to collect cycles of unreachable objects. We prove formally that our GC algorithm is correct, i.e., it is safe and live. The performance results from our Larchant prototype show that our design goals (scalability, coherence orthogonality, and good performance) are fulfilled.

Keywords: distributed systems, object-oriented systems, persistence by reachability, distributed shared memory, distributed garbage collection.