Monthly Archive for October, 2008

Usability Enhancements

The upcoming 1.2 release of the open source dispersed storage network software will see a number of usability enhancements. Foremost will be a vast simplification of software configuration. The 1.1 release had a number of fairly unintuitive XML configuration files. For the next release, for each package there will be a single configuration file that needs to be edited.

We will have a “three-tiered” configuration system consisting of:

  • Tier 1: Central configuration (the “registry”) for all things globally relevant or required to be configured centrally. The open source release will provide a consistent set of CLI tools to manipulate the registry.
  • Tier 2: Local administrative configuration. This will be a simple name/value “.conf” file. This file contains only settings that the typical administrator will need to change. It will not contain unstable or new “tweaky” configurations. The rule of thumb will generally be “is there a reason 80% of the user base will want to change this option?” If the answer is no, it will not be configurable in the “.conf” file. Most configuration options will contain defaults that should be sufficient for most people.
  • Tier 3: Advanced module configuration. For module-specific configuration that is “new”, “unstable”, or “tweaky”. Also for configuration that requires some structure but is not important enough to write a CLI for. Most of these advanced module configuration files will be XML files, but the exact format will be module specific. Most new options will be initially introduced in advanced module configurations.

All options in the first two tiers are expected to be fully documented before releases. We will want to be especially careful to avoid making backwards-incompatible changes in them between releases, especially in the first tier.

The third tier will likely not be fully documented and any changes may be incompatible between releases. If you don’t provide advanced configuration options there will be defaults compiled into the binaries that will be used. If you do provide advanced configuration files you, as a user, will be responsible for making sure that change them appropriately during an upgrade as there will likely be required values that you have to add or old unsupported values that you will have to remove.

Note that some modules will have their own internal “state”, which is different from the three configuration tiers mentioned above. For example, the slice storage module contains a database that keeps track of mount points and the slice servers and vaults stored on them. The administrator will manipulate this database when he or she adds mount points to the slice server, but the main point of the database is to keep track of the internal state of the slice storage module. Events like vault creation and deletion and slice I/O will cause database modifications. Upgrades to this database occur when upgrading to a new module (for the next few releases this will probably be a manual process).

Documentation

On the forums, brokebit asks “where is documentation posted?” A timely question, because this week we’ve spent some time planning what documentation we hope to deliver for the upcoming 1.6 release.

In the past our documentation process as looked somewhat like this: Our developers jot down some notes in various places, and with luck we get them to post those notes to our internal wiki. Over time use of the wiki has generally improved so there is pretty good information there now. We use MediaWiki, which in our setup is basically impossible to search. So finding the documentation is hard. But that’s a topic for another day!

So, when it comes time for the actual commercial release, we’ve got a bunch of good resources lying around. We have one of our business analysts  compile all of this information and hand it off to a consultant that does our training materials (we have an extensive training program for our resellers and customers) for him to compile into a complete package.

In the meantime, our open source release has a text file containing some basic installation instructions. So this is where John and I are focusing our efforts for the next release. Vicente Cano, our build master, has proposed that we try to ship 1.6 with the following documents:

  • Installation Guide
  • Admin Guide
  • User Guide

We haven’t quite hashed out exactly what these will contain or what they will look like. We have been looking to Splunk, MySQL, Apache HTTP, and the Linux From Scratch Project for inspiration, but any feedback on our ideas would be appreciated.

Even while we’re trying to figure out what we’re going to publish we need to figure out how we’re going to create the material. Our entire company is very focused on getting our commercial release out the door and we don’t have a lot of time to expend on open source documentation. This means that we need to be able to collect information from as many people as possible in a very lightweight manner.

Getting our developers to write little bits of documentation on the parts of the software they have written is potentially doable. If we were to ask for this we would have to first determine what format we wanted them to write in. They are already familiar with Microsoft Word. Word is what we use to compile our commercial documentation. I think that Word is by far the best writing tool our industry has to offer. It has incredible change tracking tools, grammar correction, and all the other drafting tools a technical writer could dream of.

Unfortunately, it is extremely bad karma to develop open source software documentation in Word—I don’t know if I could sleep at night if we committed such files to our repo. My main problem with Word files is that they are binary (even the new WordML is saved as a ZIP archive) so proper SVN diffs are impossible. They are also unstructured, so combining them into a nicely organized book is sort of an inexact science of various style combinations. You also can’t use Word files for single-sourcing very effectively.

For the long term Vicente and I generally think that DocBook is our best bet. DocBook is an open standard, supports structured documentation and single-sourcing, and has good transformation tools for HTML and PDF. The major downside is that pure DocBook has a steep learning curve. The ideal solution would be to find a good tool for Microsoft Word to DocBook round-tripping. I’ve looked long and hard and I haven’t found any really good solutions. There are some good DocBook editors out there, but any editor is going to have a learning curve and will not be as full-featured as Word.

So we haven’t fully solved this problem. In the meantime I think that we’ll have people write in text format. They can draft in Word if they want but since we’ll have to commit something reasonable to the repo it might as well be text. To make it a bit easier, we can use AsciiDoc, a text markup similar to StructuredText. This will allow developers to write in a very natural format that can eventually be converted over to DocBook when we move further in that direction.

As a final note, I found a short article about how non-programmers use documentation that seems very insightful. I would be interested in any feedback as to what sort of documentation our fledgling community is interested in.

Slashdot! Where are we going next?

The story about our Wall Street Journal Innovation award went on Slashdot last Saturday. The post is in the form of an “Ask Slashdot” where the poster wanted information about easy distributed storage and backup solutions. This is a Slashdot perennial. I found many of the comments to that post very illuminating.

Many of the comments found our solution to not stack up well against existing software that tech savvy people deploy for personal, family, or small business use.  We certainly agree that our software isn’t entirely ready for that audience, largely because of a lack of usability features. We’ve been thinking a lot recently about what we need to do to make it better.

The first problem to tackle is that of our target audience. As a company we started out thinking we would go after the consumer storage and backup market. However, this post was spot on in saying, “as for Cleversafe, the idea is as old as forward error correction, but the economics and management never seem to quite work out.” As a company it is the management of a large distributed storage network that will define our commercial offerings.

Unfortunately, for any casual user with a very limited amount of data and a low supply of bandwidth it doesn’t seem to make sense to set up and maintain the large amount of hardware that would make our solution advantageous.  For example, a basic configuration we support for our commercial product is the “8/6” vault. This means that data is split among eight different machines, any 2 of which can go down before the data is rendered inaccessible. This is fairly modest and yet would require that you have at least eight machines, which, at least for me, would be a tall order to set up at home.

This isn’t to say that a storage network won’t ever be applicable in these environments. The core benefits of security, reliability, and performance are tremendously attractive to everyone, and to every backup scenario. Right now the really significant savings kick in when you have vast amounts of data to store. Our commercial customers typically have at least 8 or more terabytes of data.

So what about the “mom-and-pop” consumers out there? We seem to see a lot of new backup technology cropping up all over the world. This post mentioned CrashPlan, a cross-platform backup product written in Java. I love CrashPlan and just bought a CrashPlan PRO license for my home network. Another Slashdotter mentioned Dropbox. I haven’t personally used Dropbox, but from what I’ve seen it looks great. I think that products like these are the future of consumer storage and backup. They are generally simple and easy to use. They don’t have all the bells and whistles that products like NetBackup have, but they get the job done and it’s pretty hard to set them up incorrectly.

So how does the Cleversafe dispersed storage network fit here? What the consumer market lacked before these products was an intelligent, easy-to-use interface to accessing your data.  While these new products fill that void very well, we feel they don’t offer much innovation when it comes to actual storage of your data. We want to develop a dispersed storage “platform” that companies can use as a better back-end storage solution so they can focus on innovation at the front-end.

We’d really like to see an increase in consumer-oriented deployments, and we’d love to help out in making this happen.  We’d really like feedback while we work towards refining our open source project. What features do you feel are lacking in our current release that would assist you in this style of deployment?  We’d really like to get a feeling for how this works out and what features are needed, so that we can begin to prioritize development to better serve this end.

If you’re storing a large volume of data you should definitely give our dispersed storage network a try. If you’re interested in storing personal data, or data for consumers, our open source solution might be a good starting point for you. We’d love to hear from your experiences in deploying our software in more consumer-oriented settings, and if you ever need advice or assistance in setup of, please feel free to contact us directly.

On a related note there has been some interest in a setting up a “cooperative dispersed storage network” that our community can use for testing and development. We’re working on details of implementing such a network, and will provide details in future posts.

Cleversafe Open Source vs. Commercial (Update)

Here at Cleversafe we use the open source development model. However, we still have to make money. In that vein, we felt it’s time to put out an update to last March’s post regarding the difference between our open source and commercial offerings.

If you visit www.cleversafe.com you can see our current lineup of dsNet (Dispersed Storage Network) solutions. In short, we offer an easy-to-use hardware solution which provides massive scalability, data longevity, security, and reliability. We provide three main components: The Accesser, which acts as a sort of “dispersed storage router” by exposing our dispersal through standard data protocols, the Slicestor, which holds the actual sliced data, and the dsNet Manager, which allows management and monitoring of the dsNet.

So the most important question to you: what am I not getting if I just use the open source software? To answer that, let’s talk about our goals with the open source release.

The open source release:

  • Provides complete capabilities for a usable dsNet, and
  • Will eventually include tools to make the dsNet client code embeddable, and
  • Is aimed at providing adequate performance for casual use.

On the other hand, the commercial release:

  • Provides a complete solution for a usable, manageable, and scalable dsNet, and
  • Includes a slice server implementation (the Slicestor) that is highly optimized for our specific hardware platform to provide the greatest possible performance.

In the future we are aiming to provide a full Dispersed Storage Platform SDK. But the first step is to provide an easy-to-use solution that you can download, install, and start using in under 10 minutes.

Open Source: Our Guiding Principle

Wesley Leggette and I are very proud to announce that we have been named Open Source Community Managers for the Cleversafe dispersed storage project. Our history with this project goes back to its very roots, and we’ve had a steady hand in driving forward the initial open source strategy and decisions. In our close work together over the years as core developers, Wesley and I have established a strong and successful rapport, and we really look forward to the opportunity of applying ourselves to the continued improvement of our open source interactions and policies.

Cleversafe’s decision to name us as Managers is a testament to the company’s fundamental belief in the value of cooperative experimentation and development in an open environment. This is a value deeply rooted in our daily lives, both with the technical and business staff at our company. This decision also signifies our project’s maturation. Up until this point, it has been all hands on deck to implement, test and deploy our first version of the dispersed storage network, and as such we haven’t had the opportunity to fully dedicate ourselves to fostering innovation in our open source community. As Open Source Managers our prime directive will be just this – to tend to the care and feeding of this project, and those who wish to join us.

In the past, Wesley and I have been involved at various levels in open source development projects and users groups, both at the international and local community level, and our combined experience with these has given us a precise understanding of how best to foster this community. With no further ado, we’re going to begin execution of that vision. We’re deprecating usage of the current forums, which we’ve found to be difficult to use in carrying out technical conversation. In it’s place we’ve deployed a bulletin board at http://dev.cleversafe.org/forums/, complete with OpenID support. We’ve also created a development-oriented mailling list, and we likewise encourage you to subscribe to it at http://lists.cleversafe.org/listinfo/dsnet-development.

The first phase of our effort is focused on deploying the right online tools and resources to support you, as a dispersed storage user or developer. We’re intensely interested in your thoughts and opinions as to what we should use, so please use the aforementioned means to tell us of your every wish and desire. We’ll do our very best to please. Our second phase of work will target the augmentation of documentation and improvement of software usability. We’ll be blogging more about this in the very near-term, as we pull everything together, so please stay tuned.

In the meantime, we need your help to grow. We believe this technology can be used for great good in and around the open source software community. Tell us where you think can we can help!

All my best,
John Quigley