Una estrategia basada en MapReduce para compresión de Big Semantic Data con HDT (A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT)

José Miguel Giménez (jose.gimenez.garcia@univ-st-etienne.fr)1, Javier David Fernández (jfernand@wu.ac.at)2, Miguel Ángel Martínez (migumar2@infor.uva.es)3

1Université Jean Monnet
2Vienna University of Economics and Business
3Universidad de Valladolid

This paper appears in: Revista IEEE América Latina

Publication Date: July 2017
Volume: 15,   Issue: 7 
ISSN: 1548-0992

Data generation and publication on the Web has increased over the last years. This phenomenon, usually known as “Big Data”, poses new challenges related with Volume, Velocity, and Variety (“The three V's”) of data. The Semantic Web offers the means to deal with variety, where RDF (Resource Description Framework) is used to model data in the form of triples subject-predicate-object. In this way, it is possible to represent and interconnect RDF triples to build a true Web of Data. Nonetheless, a problem arises when big RDF collections must be stored, exchanges, and/or queried because the existing serialization formats are highly verbose, hence the remaining Big Semantic Data challenges (volume and variety) are aggravated when storing, exchanging, or querying big RDG collections. HDT addresses this issue by proposing a binary serialization format based on compact data structures that allows RDF to be compressed, but also to be queried without prior decompression. Thus, HDT reduces data volume and increases retrieval velocity. However, this achievement comes at the cost of and expensive RDF-to-HDT serialization in terms of computational resources and time. Therefore, HDT alleviates velocity and volume challenges for the end user, but moves Big Data challenges to the data publisher. In this work we show HDT-MR, a MapReduce-based algorithm that allows RDF datasets to be serialized to HDT in a distributed way, reducing processing resources and time, but also enabling larger datasets to be compressed.

Index Terms:
Compression, HDT, MapReduce, RDF, Semantic Web, Web of Data   

Documents that cite this document
This function is not implemented yet.

[PDF Full-Text (1039)]