Combinar la calculacion de limites de trozos y de firmas de trozos para deduplicacion (Combining Chunk Boundary and Chunk Signature Calculations for Deduplication)

Witold Litwin (Witold.Litwin@dauphine.fr)1, Darrell Long (darrell@cs.ucsc.edu)2, Thomas Schwarz (tschwarz@ucu.edu.uy)3


1Université Paris Dauphine
2Univeristy of California at Santa Cruz
3Universidad Católica del Uruguay

This paper appears in: Revista IEEE América Latina

Publication Date: Jan. 2012
Volume: 10,   Issue: 1 
ISSN: 1548-0992


Abstract:
Many modern, large-scale storage solutions offer deduplication, which can achieve impressive compression rates for many loads, especially for backups. When accepting new data for storage, deduplication checks whether parts of the data is already stored. If this is the case, then the system does not store that part of the new data but replaces it with a reference to the location where the data already resides. A typical deduplication system breaks data into chunks, hashes each chunk, and uses an index to see whether the chunk has already been stored. Variable chunk systems offer better compression, but process data byte-for-byte twice, first to calculate the chunk boundaries and then to calculate the hash. This limits the ingress bandwidth of a system. We propose a method to reuse the chunk boundary calculations in order to strengthen the collision resistance of the hash, allowing us to use a faster hashing method with fewer bytes or a much larger (256 times by adding two bytes) storage system with the same high assurance against chunk collision and resulting data loss.

Index Terms:
Deduplication, Algebraic Signatures   


Documents that cite this document
This function is not implemented yet.


[PDF Full-Text (312)]