Combinar la calculacion de limites de trozos y de firmas de trozos para deduplicacion
(Combining Chunk Boundary and Chunk Signature Calculations for Deduplication)
Witold Litwin (Witold.Litwin@dauphine.fr)1, Darrell Long (firstname.lastname@example.org)2, Thomas Schwarz (email@example.com)3
1Université Paris Dauphine2Univeristy of California at Santa Cruz3Universidad Católica del Uruguay
This paper appears in: Revista IEEE América Latina
Publication Date: Jan. 2012
Volume: 10, Issue: 1
Many modern, large-scale storage solutions offer deduplication, which can achieve impressive compression rates for many loads, especially for backups. When accepting new data for storage, deduplication checks whether parts of the data is already stored. If this is the case, then the system does not store that part of the new data but replaces it with a reference to the location where the data already resides. A typical deduplication system breaks data into chunks, hashes each chunk, and uses an index to see whether the chunk has already been stored. Variable chunk systems offer better compression, but process data byte-for-byte twice, first to calculate the chunk boundaries and then to calculate the hash. This limits the ingress bandwidth of a system. We propose a method to reuse the chunk boundary calculations in order to strengthen the collision resistance of the hash, allowing us to use a faster hashing method with fewer bytes or a much larger (256 times by adding two bytes) storage system with the same high assurance against chunk collision and resulting data loss.
Deduplication, Algebraic Signatures
Documents that cite this
This function is not implemented yet.
[PDF Full-Text (312)]