Confirmation the future of storage is dispersal

With the recent EMC Atmos announcement, it is great to see momentum behind using Information Dispersal Algorithms or Forward Error Correction to increase data protection. An emerging number of vendors are beginning to agree with us that RAID and replication aren’t going to work for massive storage systems – particularly for storage clouds. And, we’ve always said we can’t be the only vendor out there who understands there is a better approach.

Using IDAs or FEC is the “how” of these types of new storage platforms, and tablestakes to get started. We focused our first three years on getting the “how” to work, and launched our dsNet system using IDAs into the market in February of 2008.

We’ve then spent the next two years building resiliency into our system.

Resiliency that…

  • Keeps your business running regardless of demands and risks
  • Resists hackers’ attempt at stealing your data, even if it’s not encrypted
  • The cost of threatening data is way too high
  • Can run despite a series of catastrophes causing multiple simultaneous failures of drives, devices and locations
  • Provides data security for data at rest or while in motion
  • Guarantees data integrity
  • Provides an “always on” architecture
  • Optimizes multi-site performance
  • Enables secure collaboration and communication

Let’s look at some of the features our product has today that deliver on resiliency, and discussion points on what to look for in these types of storage platforms.

SecureSlice™ Protection

Feature: Any group of slices less than the threshold is unrecognizable as data.

Benefits: Secures data in a multi-tenancy environment. Hardware breaches do not compromise data.

Discussion: Make sure data slices, fragments, or packets are completely transformed and don’t contain any original data to ensure data security.

PerfectBitsAssurance

Feature: Integrity checks on both individual slices and data files.

Benefits: Guarentees bit-perfect data storage and delivery. Guarentees data cannot be corrupted during file updates even under adverse conditions.

Discussion: Don’t rely on hardware for data integrity – make sure storage system addresses data integrity in software intelligence.

Tunable data protection

Feature: Set the Dispersed Storage to a multitude of M of N combinations.

Benefits: Match reliability to application requirements and ensure acceptable fault tolerance of storage nodes and sites.

Discussion: Make sure your solution has data protection tunability. Without tunability, fixed configurations may not map well to existing infrastructures.

Rebuilder™ Process

Feature: A distributed rebuilding process works across all storage nodes.

Benefits: No single point of failure or choke point assures highly scalable reliability.

Discussion: Make sure your solution has a distributed rebuilding approach so that storage nodes can handle rebuilding versus taxing servers also responsible for access throughput.

SmartRead™ Performance

Feature: Predicts the optimal network routes and storage nodes for reading the minimal number of slices to most efficiently return data.

Benefits: High level of read performance guaranteed even with failure conditions. Minimizes network traffic.

Discussion: Look for solutions that optimize performance for distributed storage systems by ranking the highest performing storage nodes in real-time.

Access Software Client

Feature: SDK that enables reading and writing directly to storage nodes versus having to go through an additional server.

Benefits: Avoids choke points of a gateway, enabling massive parallel reads and writes. Can be embedded directly into devices.

Discussion: Look for solutions with storage nodes so that a software client can speak directly to the storage. Solutions with servers and disk enclosures will require access through the server preventing massive parallel reads and writes.

We figure six features is enough to get started with, and we look forward to sharing many more new features to deliver resiliency with our upcoming software announcement.

Thanks to all the emerging solutions that validate our approach.

Federal CIO challenges with mandate towards cloud computing

A recent InfoWorld > Cloud Computing article had a story by David Linthicum on the pending mandate on cloud computing usage for government agencies.

The article cites a December Channel Insider article that explains:

According to various published reports, the OMB will mandate in the fiscal year 2011 (which starts in October 2010) that federal agencies not using cloud computing or making cloud computing part of new IT projects explain why. By fiscal year 2013, the policy will require agencies to provide details and road maps on their plans for adopting cloud-based technologies.

With the OMB pushing towards Cloud Computing, the question is what challenges exist for CIOs?

The largest issue I see for Federal CIOs moving to the cloud is addressing security, particularly of data. Data security in stand-alone systems relies on securing the perimeter, and in a cloud, there is commingling of data on the same hardware. Current guidance for securing data in the cloud is to encrypt it, but encryption introduces additional challenges such as key management, and the requirement to unencrypt prior to search and compute.

People are already addressing these items – the Homeland Security Newswire recently discussed how researchers are working on being able to search encrypted documents.

And some storage providers (such as us) are working on different methods to actually store the data itself, for example, using Information Dispersal Algorithms (IDAs) to bit-split data into slices which results no entire copy of data residing on any hardware, and is essentially encrypted data without the key management issue.

A great read is the Cloud Security Alliance’s Security Guidance for Critical Areas of Focus in Cloud Computing, helps in understanding requirements.

Federal CIOs are going to have to take a closer look at their storage platforms to see if secure data is an intrinsic characteristic, or a bolt-on, and question if the bolt-on approach is going to work in the long run.

Cleversafe receives Wikibon CTO award for the Best Storage Technology Innovations of 2009

Cleversafe has been recognized as a leader by Wikibon industry veterans with the Wikibon CTO award for the Best Storage Technology Innovations of 2009. Here’s what Wikibon had to say about Cleversafe:

Wikibon believes that this type of technology is a good strategic fit for cloud storage, especially for archive and media data. The overhead for splitting it up in to 16 slices is 25%, making it half the price of a traditional protected storage. The alternative is no backup at all for cloud storage, an approach being more popular but harder to detect.

Cleversafe introduced dispersed storage technology commercially in 2008 after three years of research and development. This award validates the enhancements Cleversafe has made since product launch. Products that meet cloud requirements are not going to be built overnight, and Cleversafe has demonstrated diligent and progressing efforts to serve the requirements that a distributed, multi-tenancy storage platform demand.

We are also pleased with our product portfolio progress. Look for announcements in the next month from us that further validate Cleversafe as a leader in massively scalable systems designed with inherent data security, particularly for data in the cloud.

Click here for the full post at Wikibon.

Other winners include Storwize, and Unisys.

Thoughts on Mexico Federal Government Conference

On January 20th, I attended a conference on cloud computing and cloud storage hosted by the Mexican federal government who is looking at how they can use cloud systems to provide more efficient and more reliable IT systems both for internal use as well as to better deliver services to its citizens.  They selected eight leading cloud computing and cloud storage vendors to present their cloud strategies:  IBM, EMC, Google, Sun, HP, Cisco, Hitachi, and Cleversafe.  There’s been a lot of press coverage lately of “cloud”, so it was quite interesting to really see the presentations to directly understand the state of cloud technology and what each company is providing.

Surprisingly, six of these companies were consistent in not discussing how their products have evolved to actually address the requirements of the cloud.

Was it designed to work in the cloud?

IBM presented a definition of cloud computing and spoke very generally about the elements and benefits of cloud computing.  Although IBM didn’t present any new technologies for cloud, they have a very long history in the technology approaches leading up to cloud and are integrating various solutions for the cloud market.

Carlos Medinas of Cisco presented a unified architecture for cloud computing which promised higher efficiency.  Understandably, Cisco views the world as a series of detailed network diagrams which results in a presentation of a series of detailed network diagrams.  Clearly, cloud computing and cloud storage will utilize the Internet network infrastructure provided primarily by Cisco, so they will play a key role; however, Cisco will need to work with various partners to delivery solution-level benefits.

Antonio Guerraro of HP (who was formerly in EDS Mexico) then spoke on the business process capabilities of cloud computing.  HP’s three strategies are to:

-          Help customers secure, manage and administer cloud systems

-          Enable providers of cloud services

-          Provide cloud services through partners like Gobble, Print 2 Cloud, MacCloud, Snapfish, etc.

At this point in the presentation, it became clear that the standard cloud presentation was to 1) define the domain of cloud computing, 2) talk about how your company has been working on cloud-like capabilities since the dawn of (digital) time and then 3) show a big matrix of capabilities that shows that your company knows everything and can do everything – thus providing a complete solution.  As had IBM and Cisco, HP presented this approach which then was repeated by Sun, EMC and Hitachi.

Sun’s unique spin was that cloud-like capabilities was so integral to the company was that it was their company slogan, “The Network is the Computer”.  No doubt Sun has always been on this theme and has played a leadership role with technologies like Java, Ethernet workstations, etc.  With the Oracle merger concluding, it will be interesting to see how that affect’s Sun’s (and Oracle’s) strategy.

EMC understandably featured their virtual computing capabilities through VMware as well as their partnership with Cisco and VMware for providing cloud solutions.  They then positioned their Atmos product as COS – Cloud Optimized Storage – a new a category of storage like NAS or SAN.  EMC touted the capability of Atmos to manage data sets using policies.  For example, you can designate policy levels of “bronze” and “gold” which would each correspond to different reliability levels with higher numbers of copies for data associated with the gold policy.  EMC said that Atmos would automatically make multiple copies as needed which they illustrated by showing a map of Mexico with “4, 5 or ‘n’ copies” scattered across the map.  I can certainly see why EMC would want to sell a system that automatically makes 4, 5 or ‘n’ copies, but it is hard to see how most customers would find that approach cost effective.

As with all the prior presenters, Hitachi Data Systems lead off with defining the cloud storage market.  Hitachi’s definition included three segments:  public cloud, private cloud and content repository systems.  Hitachi asserted their differentiator as being able to provide different storage interfaces – NAS and SAN – on the same pool of drives.  Hitachi also has a “dynamic provisioning” system and retention (compliance) policies and position themselves around the differentiators of integration, reliability & security and scalability.

Each of these prior six presenters told a similar story and it seems as though many vendors are talking about how they have the breadth and depth to cover cloud, but aren’t giving any concrete examples of how their products have fundamentally evolved to address the requirements of a massively scalable, multi-tenancy system.

John Farrell of Google was up next and it was refreshing that he had something different to say.  He started at a very high level by referring to the book The Big Switch by Nicholas Carr.  The general theme was that computing and applications are much more efficient when delivered as a utility, like electricity.  John also provided some insights into the improved cost efficiency – specifically the life cycle cost – of gmail vs. operating an internal email system.

I then concluded the conference by stepping back and looking at long term technology trends to answer the question of whether cloud computing and cloud storage is just an industry fad or a genuine shift.  My presentation then used this technology megatrends framework to look in detail at where cloud storage is headed and how we’ll get there which is the subject a future post…

Cleversafe Ranks Third in StorageMonkeys Top Storage Vendor Blogs – 2010

Cleversafe is pleased to announce our 3rd place finish in the StorageMonkeys Top Storage Vendors Blogs for 2010. You may be confused based on the official results posted on StorageMonkeys that placed Cleversafe in 21st.

We are not calling for a revote, but perhaps a more equitable calculation is in order, you review the math…

Rank Company Blog Votes Employee #* %
1 Zetta Zetta blog 39 25 156.00%
2 Nirvanix Stephen Foskett 55 38 144.74%
3 Cleversafe Cleversafe blog 28 35 80.00%
4 Ocarina Networks Carter George & Sunshine Mugrabi 20 50 40.00%
5 Sepaton Jay Livens 19 87 21.84%
6 3Par Mark Farley 66 614 10.75%
7 Xiotech Xiotech blog 24 300 8.00%
8 Pillar Mike Workman 17 500 3.40%
9 FalconStor Chris Poelker 13 505 2.57%
10 HDS Hu Yoshida 67 2700 2.48%
11 NetApp Val Bercovici 107 7976 1.34%
12 HDS Michael Hay 36 2700 1.33%
13 HDS David Merril 36 2700 1.33%
14 NetApp Vaughn Stewart 103 7976 1.29%
15 NetApp Dave Hitz 95 7976 1.19%
16 HDS Pete Gerr 30 2700 1.11%
17 NetApp Nick Triantos 63 7976 0.79%
18 NetApp Alex McDonald 56 7976 0.70%
19 NetApp Larry Freeman 48 7976 0.60%
20 EMC Chuck Hollis 205 42000 0.49%
21 NetApp Storage Efficiency 36 7976 0.45%
22 EMC Mark Twomey 182 42000 0.43%
23 EMC Barry Burke 175 42000 0.42%
24 EMC Dave Graham 153 42000 0.36%
25 HP StorageWorks 99 321000 0.03%
26 Sun Brendan Gregg 6 27596 0.02%
27 Sun Adam Leventhal 4 27596 0.014%
28 IBM Barry Whyte 41 398455 0.010%

StorageMonkeys ranked the Storage Blogs based on the number of votes. We all know that a company with tens of thousands of employees should have more leverage to bring out the vote when compared to companies with considerably less resources. Cleversafe’s ranking calculation is based on the percentage of votes compared to number of employees in the company.

Because at the end of the day, isn’t it all about how you slice the data?

The real question is do these blogs have great content? Here’s a few links for recent posts – you decide.
Major Trends at StorageVisions 2010
Silent Errors
Trends in the Advancement of Storage Virtualization

* Employee numbers gathered through the internet and are not guaranteed to be accurate as of today.