Rotor
Motivation
It is well known that distributed systems pose serious difficulties concerning memory management. When done manually, it leads to memory leaks and dangling references causing applications to fail. We address this problem by developing a distributed garbage collector for Rotor.
We believe that such an extension of the Rotor capabilities (that already has a local garbage collector and uses leases for distributed garbage collection) is very important for supporting cooperative work among users. As a matter of fact, data sharing in a distributed environment, which is required for cooperative work, results not only in dangling references (references to objects erroneously deleted) but also in memory leaks (useless objects that were not deleted). These errors have a strong negative impact on programmer's productivity and programs robustness. That is why .Net does have a local garbage collector.
Research
The current approach to distributed garbage collection in Rotor is very simple. Leases are used so that when an object X is not remotely invoked for a certain amount of time (bigger than its lease) the reference pointing to X is ignored for reachability considerations; therefore, X may be collected even if there is still a reference pointing to it, from some other process. This means that safety may not be ensured.
Our proposal is the to extend Rotor with a distributed garbage collector. The algorithm to be used is based on reference-listing and must have the following properties: safe, live, complete, do not degrade the performance of applications, and well integrated with Rotor. Developing such a distributed garbage collector is not an easy task. The main difficulties, resulting from the properties listed above, are due to crashes, message lost or duplication, and races.
- Safety: this means that an object should not be collected as long as a reference points to it. The difficulty arises from clients that may crash or from communication problems; this may lead to a situation in which the system is never informed that a remote object X is, in fact, no longer referenced. To deal with this situation, the use of leases seems to be a good approach, i.e. it relaxes safety while making the system tolerant to such faulty situations. We intend to use this solution, already used in Rotor, as a last resource. Under normal circumstances, i.e. without crashes or communication failures, the reference-listing algorithm will ensure safety.
- Liveness: this means that the distributed garbage collector must do its work, i.e. if there are garbage objects they must be reclaimed.
- Completeness: this means that all objects that are garbage must be collected. This is a problem in a distributed system because of cycles. This is a well-known problem that we intend to address.
- Performance: this means that applications should not be slowed down because of the distributed garbage collector. We intend to address this problem by carefully designing implementing, measuring and optimizing the collector.
- Integration: this means that the Rotor system should not be modified, i.e. we will try to limit the modifications done to the code. Ideally, there will be no major changes. The new code will be developed as a module that will have a well-defined interface with the rest of the system, in particular, with the existent local garbage collector.
In summary, the most important difficulties are: i) reclaiming distributed cycles of garbage, and ii) achieving good performance and integration of the implementation.
Results
The main result of this project is a software module integrated with Rotor that provides a distributed garbage collector with the properties previously mentioned.
Sponsoring bodies: Microsoft Research
Coordinator: Paulo Ferreira
Partners: INESC-ID, Microsoft Research
Homepage: N/A

