SDFS: A Robust Deduplication File System for Linux
If you're paying for offsite data storage by the byte, you naturally want to keep costs down by making sure you're not storing several copies of the same data. Deduplicating your data is an effective way to save transit time, disk space, and maybe even money. Open source software, SDFS, helps you get the job done.
SDFS is designed to support virtual environments and includes additional functionality for VMware, Xen, and KVM. Other features include:
* Scalability - SDFS can dedup a Petabyte or more of data. Over 3TB per gig of memory at 128k chunk size
* Speed - SDFS can perform deduplication/redup at line speed 150 MB/S+
* VMWare support - Work with vms - can dedup at 4k block sized. This is required to dedup Virtual Machines effectively
* Flexible storage - deduplicated data can be stored locally, on the network across multiple nodes, or in the cloud
* Inline and Batch Mode deduplication - The file system can dedup inline or periodically based on needs. This can be changed on the fly
* File and Folder Snapshot support - Support for file or folder level snapshots
Think deduplication isn't important, or that it won't impact your company's bottom line? Think again. Tech writer Chris Poelker explains that deduplicating not only has the potential to save your company money, it also benefits the environment. "Data deduplication goes a long way toward reducing data storage costs by making storage much more efficient," he says, "which in turn can reduce the overall footprint inside the data center. Just think: if by deduplicating your data you can store the exact same amount of information in less than one-tenth the footprint, imagine how much money and energy you could save in power and cooling costs."
System requirements for running SDFS include, x64 Linux distribution, Fuse 2.8, 2 GB of RAM, and Java 7. It's licensed under the GPLv2 license and and available for free download at the project's Google Code page.
Photo courtesy of mrkathika.