Nuno de Ferraz Almeida e Peixoto Machado


Lightweight Cooperative Logging for Fault Replication in Concurrent Programs


Tese submetida para provas de mestrado em Engenharia Informática e de Computadores Instituto Superior Técnico, Universidade Técnica de Lisboa.

Abstract

With the advent of multi-processors, arose the need of developing parallel software programs to take full advantage of the available computing resources and achieve better perfor- mance. However, writing and debugging concurrent software are very challenging tasks because of the non-deterministic nature of this kind of programs, i.e. running the same program several times may lead to different outcomes for each run. The deterministic replay technique addresses this problem, as it provides a faithful reproduction of the original run. Unfortunately, deter- ministic replay comes with very expensive overheads, since it requires recording all sources of non-determinism to achieve the original program behavior.

To address this problem, this thesis presents CoopLEAP, a system that provides fault replication of concurrent programs, based in cooperative recording and partial log combination. For this, CoopLEAP employs a partial recording scheme to reduce the amount of information that a given program instance is required to store to support deterministic replay. The use of partial logs allows to substantially reduce the overhead imposed by the instrumented code execution, but raises the problem of finding the combination of logs capable of replaying the fault. This thesis also proposes an heuristic, denoted Similarity-Guided Merge, to perform this search. In-house and third-party benchmarks, used to evaluate the implemented prototype of CoopLEAP, show that it can not only successfully replay concurrency bugs, but also impose smaller overheads in comparison with other existing solutions.


Publicações

Lightweight Cooperative Logging for Fault Replication in Concurrent Programs
Nuno de Ferraz Almeida e Peixoto Machado
MSc Thesis. Instituto Superior Técnico, Universidade Técnica de Lisboa.
October, 2011.
Available BibTeX, MSC Thesis, extended abstract of the thesis, and mid-term report.
Reprodução de Faltas em Programas Concorrentes Através da Combinação de Múltiplos Históricos Parciais.
N. Machado, P. Romano, and L. Rodrigues.
Actas do terceiro Simpósio de Informática (Inforum), Coimbra, Portugal, Sep, 2011
Available BibTeX, extended report (pdf).
Lightweight Cooperative Logging for Fault Replication in Concurrent Programs.
N. Machado, P. Romano and L. Rodrigues
Proceedings of the 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), Boston (MA), USA June 2012.
Available BibTeX, abstract (html) and report (pdf).

Luís Rodrigues