Nuno de Ferraz Almeida e Peixoto Machado
Lightweight Cooperative Logging for Fault Replication in Concurrent Programs
Tese submetida para provas de mestrado em Engenharia Informática e de Computadores
Instituto Superior Técnico, Universidade Técnica de Lisboa.
Abstract
With the advent of multi-processors, arose the need of developing
parallel software programs to take full advantage of the available
computing resources and achieve better perfor- mance. However, writing
and debugging concurrent software are very challenging tasks because
of the non-deterministic nature of this kind of programs, i.e. running
the same program several times may lead to different outcomes for each
run. The deterministic replay technique addresses this problem, as it
provides a faithful reproduction of the original run. Unfortunately,
deter- ministic replay comes with very expensive overheads, since it
requires recording all sources of non-determinism to achieve the
original program behavior.
To address this problem, this thesis presents CoopLEAP, a system that
provides fault replication of concurrent programs, based in
cooperative recording and partial log combination. For this, CoopLEAP
employs a partial recording scheme to reduce the amount of information
that a given program instance is required to store to support
deterministic replay. The use of partial logs allows to substantially
reduce the overhead imposed by the instrumented code execution, but
raises the problem of finding the combination of logs capable of
replaying the fault. This thesis also proposes an heuristic, denoted
Similarity-Guided Merge, to perform this search. In-house and
third-party benchmarks, used to evaluate the implemented prototype of
CoopLEAP, show that it can not only successfully replay concurrency
bugs, but also impose smaller overheads in comparison with other
existing solutions.
Publicações
- Lightweight Cooperative Logging for Fault Replication
in Concurrent Programs
- Nuno de Ferraz Almeida e
Peixoto Machado
- MSc Thesis. Instituto Superior Técnico,
Universidade Técnica de Lisboa.
- October, 2011.
- Available BibTeX, MSC Thesis, extended abstract of the
thesis, and mid-term
report.
- Reprodução de Faltas em Programas Concorrentes
Através da Combinação de Múltiplos Históricos
Parciais.
- N. Machado, P. Romano, and L. Rodrigues.
- Actas do terceiro Simpósio de Informática
(Inforum), Coimbra, Portugal, Sep, 2011
- Available BibTeX, extended report
(pdf).
- Lightweight Cooperative Logging for Fault
Replication in Concurrent Programs.
- N. Machado,
P. Romano and L. Rodrigues
- Proceedings of the 42nd Annual
IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN 2012), Boston (MA), USA June 2012.
-
Available BibTeX, abstract (html) and report (pdf).
Luís Rodrigues