De-Duplication


There is a popular misconception that “Data DeDuplication” and “Single Instance” are the same thing, they are not. Single instance is whereby an identical copy of a document is retained once for example an e-mail of 1MB is sent to 100 people, therefore you need to store the message once vs the normal way of 100x for the same sent message. As you can see single instance storage greatly reduces the amount of data stored on file servers and Email servers.

Data de-duplication on the other hand takes single instance storage one step further by looking at patterns of information within a “Data Block”.

Quantum’s DXi products use block based de-duplication segment a dataset into variable length blocks and looking for repeated blocks. When they find a block they’ve seen before, instead of storing it again, they store a pointer to the original. Reading the file is simple – the sequence of pointers makes sure all the blocks are accessed in the right order.

This data block can be a word document which takes up block level space on the disk drive, from this Data DeDuplication looks for identical or similar blocks and creates a Hash Algorithm and removes the blocks. Data deduplication dramatically reduces the backup window as all date is sent to disc first and it is then deduplicated after.

It can add all offsite replica image pretty clear that data de-duplication technology works best when it sees sets of data with lots of repeated segments every.  Data de-duplication does not work well with video or sound files. The Quantum range of DXi data de-duplication systems allow the device to be seen as network attached storage, Fibre Channel or iSCSI storage. The Quantum DXi series also uses variable block de-duplication technology which increases the data compression ratio. As with all IT investments, data de-duplication must make business sense to merit adoption.  At one level, the value is pretty easy to establish.  Adding disk to your backup strategy can provide faster backup and restore performance, as well as give you RAID levels fault tolerance.  But with conventional storage technology, the amount of disk people need for backup just costs too much.  Data de-duplication solves that problem for if you many users because it lets them put 10 to 50 times more backup data on the same amount disk.

Disaster recovery protection

The minimum disaster recovery (DR) protection you need is to make backup data safe from site damage.  After all, equipment and applications can be replaced, but digital assets may be irreplaceable.  And no matter how many layers of redundancy a system has, when all copies of anything are stored on a single hardware system, they are vulnerable to fires, floods, or other site damage. For most users, removable media provides site loss protection.  And it’s one of the big reasons that disk backup isn’t used more: When backup data is on disk, it just sits there.  You have to do something else to get DR protection.  People talk about replicating backup data over networks, but almost nobody actually does it: Backup sets are too big and network bandwidth is too limited.

Data de-duplication changes all that – it finally makes remote replication of backup practical and smart.  How does it work?  Just like you only store the new blocks in each backup, you only have to replicate the new blocks.  Suppose 1% of a 500 GB backup has changed since a previous backup. That means you only have to move 5 GB of data to keep the two systems synchronised – and you can move that data in the background over several hours.  That means you can use a standard WAN to replicate backup sets.

For disaster recovery, that means you can have an offsite replica image of all your backup data every day, and you can reduce the amount of removable media you handle.  That’s especially nice when you have smaller sites that don’t have IT staff.  Less removable media can mean lower costs and less risk.  Daily replication means better protection.  It’s a win-win situation.

How do you get them synched up in the first place?  The first replication event may take longer, or you can co-locate devices and move data the first time over a faster network, or you can put backup data at the source site on tape and copy it locally onto the target system.  After that first sync-up is finished, replication only needs to move the new blocks. What about tape?  Do you still need it?  Disk-based de-duplication replication can reduce the amount of tape you use, but most IT departments combine the technologies, using tape for longer term retention.  This approach makes sense to most users.  If you want to keep data for six months or three years or seven years, tape provides the right economics and portability.

The best solution providers will help you get the right balance and Quantum lets you manage the disk and tape systems from a single management console, and supports all your backup systems with the same service team.