Data Management in Cloud, Grid and P2P Systems

Record Detail Back

XML

Data Management in Cloud, Grid and P2P Systems

Reducing data transfer in MapReduce’s shuﬄe phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several op- timizations have been proposed to reduce data transfer between map- pers and reducers. Nevertheless, all these approaches are limited by how intermediate key-value pairs are distributed over map outputs. In this paper, we address the problem of high data transfers in MapReduce, and propose a technique that repartitions tuples of the input datasets, and thereby optimizes the distribution of key-values over mappers, and increases the data locality in reduce tasks. Our approach captures the relationships between input tuples and intermediate keys by monitoring the execution of a set of MapReduce jobs which are representative of the workload. Then, based on those relationships, it assigns input tuples to the appropriate chunks. We evaluated our approach through experimentation in a Hadoop deployment on top of Grid5000 using standard benchmarks. The results show high reduction in data transfer during the shuﬄe phase compared to Native Hadoop.

Statement of Responsibility

Author(s)

Abdelkader Hameurlain, Wenny Rahayu, David Taniar (Eds.) - Personal Name

Edition

Call Number

ISBN/ISSN

978-3-642-40052-0

Subject(s)

Data Management in Cloud, Grid and P2P Systems

Classification

NONE

Series Title

GMD

Management

Language

English

Publisher

Publishing Year

2013

Publishing Place

Collation

1-133

Specific Detail Info

File Attachment

LOADING LIST...

Availability

LOADING LIST...