Learn how it works

A scientific publication about the goals, concepts and architecture decisions behind NubiSave as well as its remaining limitations has been published, presented and recognised with a Best Paper award at the 4th IEEE International Conference on Utility and Cloud Computing (UCC) in Melbourne, Australia in December 2011. A much more detailed, extended and refined article which includes many controller optimisations has been accepted for publication in the Elsevier Future Generation Computer Systems journal in 2012.
  • J. Spillner, G. Bombach, S. Matthischke, J. Müller, R. Tzschichholz, A. Schill: Information Dispersion over Redundant Arrays of Optimal Cloud Storage for Desktop Users, Proc. UCC 2011, pp. 1-8 (read preprint, view slides)
  • J. Spillner, J. Müller, A. Schill: Creating Optimal Cloud Storage Systems, FGCS (in press - view corrected proof; DOI 10.1016/j.future.2012.06.004; online June 2012)
A description of all the implementation features and progress has been created in the form of a study thesis after hard improvement work in May 2012:
  • J. Müller: NubiSave++: Failure Resilient Distributed File System in the Cloud
A flexible cloud storage policy language, as well as multi-user, multi-configuration storage gateways which adhere to such policies, have been proposed and evaluated using NubiSave as the embedded storage controller. The results were published and presented at the 1st Latin American Conference on Cloud Computing and Communications (LatinCloud) in Porto Alegre, Brazil in November 2012.
  • J. Spillner, A. Schill: Flexible Data Distribution Policy Language and Gateway Architecture, Proc. LatinCloud 2012, pp. 1-6 (read preprint, view slides)
For a less technical explanation, please read the background information below.

Motivation

Mozilla's "take back the web" applied to cloud computing means "take back data sovereignty"! NubiSave is designed to achieve exactly this by establishing a local cloud storage controller, representing the core building block of an automated cloud storage gateway. All of this happens within the realm of the user - either directly as desktop user, or as administrator responsible for securing the data of a group of people in a local network.

Technical Background

Storing data locally on disks which are subject to breakage, theft and technical restrictions in capacity and performance has led to sophisticated concepts for encryption and for connecting disks as striped (RAID-0), fully redundant (RAID-1) or error-coding redundant (e.g. RAID-5) arrays. In the cloud, however, users entrust all of their data to a single storage provider, either with encryption, or without. Even encrypted cloud storage is still subject to unauthorised storage area access, deletion and brute-force attacks on the encryption. Data dispersal in the cloud (RAIC-n for n >= 1) is the concept of ensuring that each provider only gets a part of the data, while at the same time reaching higher long-term availability and access performance. NubiSave introduces techniques to reach optimality for the user, called RAOC-n, for storage providers over which an array is constructed, continuously used and adapted to changing cloud providers.

Architecture

NubiSave allows for flexible chaining of splitter (1:n), modifier (1:1) and leaf transport modules in a hierarchic tree. This makes it possible to send parts of the data unmodified to one provider, encrypt for a second provider and compress for a third provider. One possible setup consisting of three storage providers is shown in the diagram below:


Cloud Providers and Modifiers

Which cloud provider and modifier modules can be used with NubiSave? For a full list, visit the listing at the FUSE website. Selected modules are presented here. Overlay modifier modules include, among others:
  • EncFS for encryption
  • Lessfs for deduplication and compression
  • Gitfs for versioning
  • and of course, nested splitters
Cloud provider transport modules include, among others:
  • Local file systems and in-kernel network file systems: CIFS, SMB, NFS, hard disks, USB sticks, ...
  • Generic protocols: SSHFS, FuseDAV + DavFS2 for WebDAV, ObexFS for Bluetooth devices, CurlFTPFS for FTP, ...
  • Specific providers: CloudFusion (DropBox, SugarSync), S3QL (Google Storage, Amazon S3, Eucalyptus Walrus, OpenStack Swift), Wuala, BoxFS, ...

Unique Features

NubiSave's intention is to take distributed cloud storage to a new level by experimenting with various optimisations within the handling of providers and data beyond the initial optimal configuration. Such features includes:
  • Redundancy: NubiSave integrates the best space optimal maximum-distance separable and minimum-bandwidth/-storage regenerating erasure codes such as Cauchy Reed-Solomon through the jErasure library and JigDFS. For information theoretic security, a secret sharing scheme would be used instead. Zero redundancy is supported as well. Checksums are calculated for all fragments.
  • Scheduling: NubiSave introduces two storage strategies: Either it uses all providers in parallel or in a round-robin scheme.
  • Cache: NubiSave achieves distribution transparency through flexible multistage cache.
  • Streaming: NubiSave works on streams, hence doesn't need to buffer large files. This makes it suitable for small and embedded devices. And robots, too.
  • Sessions: NubiSave can store the metadata file in the cloud as well, hence making it possible to access all data from multiple devices with just a small pointer to this file.
  • Chunking: NubiSave introduces an intermediate layer between a large file and small fragments. The chunks or blocks contained therein can be written and retrieved separately, thus avoiding the need to transfer the entire file for smaller changes. Sparse files are specially supported.
  • Configuration: NubiSave can be configured through GUIs or plain text files. Modules as well as providers can be added and removed dynamically at runtime. Multiple distributed NubiSave installations can be connected to work across machines, hence increasing scalability.