Manuel da Silva Santos Gomes Ferreira
Efficient Support for Selective MapReduce Queries
Tese submetida para provas de mestrado em Engenharia Informática e de Computadores
Instituto Superior Técnico, Universidade de Lisboa.
Today, there is an increasing need to analyse very large datasets,
that have been coined big data, and that require specialised storage
and processing infrastructures. MapReduce is a programming model aimed
at processing big data in a parallel and distributed manner. Known for
its scalability, ease of use and fault-tolerance, MapReduce has been
widely used by different domain applications.
The work described
in this thesis proposes and evaluates ShortMap, a system that relies
on a combination of techniques aimed at efficiently supporting
selective MapReduce jobs that are only concerned with a subset of the
entire dataset. We combine the use of an appropriate data layout with
data indexing tools to improve the data access speed and significantly
shorten the Map phase of the jobs. An extensive experimental
evaluation of ShortMap shows that, by avoiding reading irrelevant
blocks, it can provide speedups up to 80 times when compared to the
basic Hadoop implementation. Further, our system also outperforms
other MapReduce implementations that use variants of the techniques we
have embedded in ShortMap. ShortMap is open source and available for
- Efficient Support for Selective MapReduce Queries
- Manuel da Silva Santos Gomes Ferreira
- MSc Thesis. Instituto Superior Técnico,
Universidade de Lisboa.
- October, 2014.
- Available BibTeX, MSC Thesis, extended abstract of the
thesis, and mid-term
- Suporte eficiente
para pesquisas seletivas em MapReduce.
- M. Ferreira, J. Paiva and L. Rodrigues.
- Actas do sexto
Simpósio de Informática (Inforum), Porto,
Portugal, Sep. 2014.
- Prémio melhor artigo de estudante INForum
- SmartFetch: Efficient Support for Selective Queries
- M. Ferreira, J. Paiva, M. Bravo and
- In Proceedings of the 7th IEEE International Conference
on Cloud Computing Technology and Science (CloudCom), Vancouver,
Canada, November 2015.
- Best paper