What do deduplication, incremental forever, and the Olsen twins have to do with each other? It’s all about duplicate data. Mary Kate and Ashley Olsen are fraternal twins. Identical twins share the same 100% of their DNA. Fraternal twins share about 50% of their DNA – the same as any other sibling.
If we created a DNA database for a set of identical twins, we would only need to store that DNA information once since we know that the DNA is identical. However, if we created a DNA database for the Olsen twins, we would either need to create two completely unique sets of data or we would need techniques for understanding and categorizing which data is unique and which data is identical. The techniques for this understanding and categorizing is what data deduplication is.
File-level deduplication, block-level duplication, byte-level deduplication, and incremental forever are all techniques that eliminate duplicate data. Theoretically, the data reduction achieved is identical for each when 100% of data is duplicated. Practically, there are significant differences in the time and computational resources required between each of these techniques. It’s when only some data is duplicated, as in the case with the DNA of fraternal twins, that the data reduction varies. The time and computational resources required with each of these techniques also varies in this case.
In this paper, we’ll compare and contrast the advantages and disadvantages of each of these techniques and explain why incremental forever when combined with byte-level deduplication is the superior methodology for reducing redundant data in the most efficient manner possible. Further, we’ll discuss the advantages and disadvantages of both physical and virtual backup appliances versus dedicated deduplication devices.