Manuel da Silva Santos Gomes Ferreira


Efficient Support for Selective MapReduce Queries


Tese submetida para provas de mestrado em Engenharia Informática e de Computadores Instituto Superior Técnico, Universidade de Lisboa.

Abstract

Today, there is an increasing need to analyse very large datasets, that have been coined big data, and that require specialised storage and processing infrastructures. MapReduce is a programming model aimed at processing big data in a parallel and distributed manner. Known for its scalability, ease of use and fault-tolerance, MapReduce has been widely used by different domain applications.

The work described in this thesis proposes and evaluates ShortMap, a system that relies on a combination of techniques aimed at efficiently supporting selective MapReduce jobs that are only concerned with a subset of the entire dataset. We combine the use of an appropriate data layout with data indexing tools to improve the data access speed and significantly shorten the Map phase of the jobs. An extensive experimental evaluation of ShortMap shows that, by avoiding reading irrelevant blocks, it can provide speedups up to 80 times when compared to the basic Hadoop implementation. Further, our system also outperforms other MapReduce implementations that use variants of the techniques we have embedded in ShortMap. ShortMap is open source and available for download.


Publicações

Efficient Support for Selective MapReduce Queries
Manuel da Silva Santos Gomes Ferreira
MSc Thesis. Instituto Superior Técnico, Universidade de Lisboa.
October, 2014.
Available BibTeX, MSC Thesis, extended abstract of the thesis, and mid-term report.
Suporte eficiente para pesquisas seletivas em MapReduce.
M. Ferreira, J. Paiva and L. Rodrigues.
Actas do sexto Simpósio de Informática (Inforum), Porto, Portugal, Sep. 2014.
Prémio melhor artigo de estudante INForum 2014.
SmartFetch: Efficient Support for Selective Queries
M. Ferreira, J. Paiva, M. Bravo and L. Rodrigues.
In Proceedings of the 7th IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Vancouver, Canada, November 2015.
Best paper award

Luís Rodrigues