SDFS: A Robust Deduplication File System for Linux

by Lisa Hoover - Mar. 25, 2010Comments (3)

http:__www.flickr.com_photos_kathika_2811490521_

If you're paying for offsite data storage by the byte, you naturally want to keep costs down by making sure you're not storing several copies of the same data. Deduplicating your data is an effective way to save transit time, disk space, and maybe even money. Open source software, SDFS, helps you get the job done.

SDFS is designed to support virtual environments and includes additional functionality for VMware, Xen, and KVM. Other features include:

* Scalability - SDFS can dedup a Petabyte or more of data. Over 3TB per gig of memory at 128k chunk size

* Speed - SDFS can perform deduplication/redup at line speed 150 MB/S+

* VMWare support - Work with vms - can dedup at 4k block sized. This is required to dedup Virtual Machines effectively

* Flexible storage - deduplicated data can be stored locally, on the network across multiple nodes, or in the cloud

* Inline and Batch Mode deduplication - The file system can dedup inline or periodically based on needs. This can be changed on the fly

* File and Folder Snapshot support - Support for file or folder level snapshots

Think deduplication isn't important, or that it won't impact your company's bottom line? Think again. Tech writer Chris Poelker explains that deduplicating not only has the potential to save your company money, it also benefits the environment. "Data deduplication goes a long way toward reducing data storage costs by making storage much more efficient," he says, "which in turn can reduce the overall footprint inside the data center. Just think: if by deduplicating your data you can store the exact same amount of information in less than one-tenth the footprint, imagine how much money and energy you could save in power and cooling costs."

System requirements for running SDFS include, x64 Linux distribution, Fuse 2.8, 2 GB of RAM, and Java 7. It's licensed under the GPLv2 license and and available for free download at the project's Google Code page.

Opendedup

Photo courtesy of mrkathika.



John Mark Walker uses OStatic to support Open Source, ask and answer questions and stay informed. What about you?



3 Comments
 

This is one major thing that Linux needs, a generally available dedup file system. Anyone have performance numbers?


0 Votes

How to you get to know the current status of this project? For linux users to consider seriously, project needs to publish a standard benchmark test result [Read/Write- Sequential, random, small file, size, large file]

How does it handle deletion and how does read perform over long use?

It also needs to be available on all popular linux variants..


0 Votes

Currently using Nexenta Core ZFS, but I don't know enough about Solaris to tweak it the way I want it.


SDFS would be a nearly perfect replacement for me if it also supported the robust checksumming and SSD caching features you see in ZFS. Anyone know of some good articles on the more technical details of SDFS?


0 Votes
Share Your Comments

If you are a member, to have your comment attributed to you. If you are not yet a member, Join OStatic and help the Open Source community by sharing your thoughts, answering user questions and providing reviews and alternatives for projects.


Promote Open Source Knowledge by sharing your thoughts, listing Alternatives and Answering Questions!