Archive

Cleversafe to discuss security in the cloud at GIIC Annual Meeting

Chris Gladwin, our founder and CEO, will participate in the panel “Security on the Cloud – more or less secure than behind your firewall?” at the Global Information Infrastructure Commission Annual Meeting held Friday, May 28th. This panel will also include industry leaders from Accenture and the National Institute of Standards and Technology, and will be moderated by Michael R. Nelson, Visiting Professor, Internet Studies, Communication, Culture and Technology Program from Georgetown University.

The 2010 annual meeting will focus overall on Internet security, and will also include a keynote address by Vint Cerf from Google, a presentation on critical infrastructure protection from Alan Paller, director of research, SANS Institute, and a second panel discussion on “Internet Protocol – Ensuring a Secure Environment” with industry luminaries from Huawei, Electronic Warfare Associates, and Telcordia Cyber Security, among others. The GIIC aims to be a forum for the development of ideas and actions on key policy issues for the high technology industry, and the annual meeting is an opportunity for government leaders and C-level executives to share ideas and address challenges to the ICT industry.

Next-generation disaster recovery- As more adopt cloud, what will be the essential elements as DR shifts from hardware to cloud?

A recent study of the Most Important IT Priorities for 2010 conducted by ESG shows that Data Backup and Recovery, along with Disaster Recovery (DR) programs are still top-of-mind for most organizations (see Figure 1).  However, the way in which these services are being provided are beginning to change with the adoption of Cloud storage initiatives.  Traditional DR solutions—utilizing a company’s own hardware—are typically very slow and prone to failures because they may involve many manual and complex steps, are difficult to test, and require expensive duplication of the production data center infrastructure to ensure a reliable recovery.  As a result, organizations find themselves unable to provide sufficient disaster recovery protection for more than a small subset of their production systems and for these, the recovery time may end up being days or even weeks.  Additionally, the expense of having duplicate systems at both the primary production site, as well as, at the recovery site(s) can be very costly.

Figure 1. Most Important IT Priorities for 2010

Before we get into how the strategies behind effective DR are changing, let’s take a moment to make sure we define the two main metrics used to measure success of any DR implementation: 1. Recovery Time Objective and Recovery Point Objective.  Recovery Time Objective (RTO) as defined by Wikipedia is “…the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.”  For example, some companies may be willing to tolerate up to 24 hours of system downtime, while others (e.g., online retailers such as Amazon) require RTO’s of seconds to minutes to prevent losses in the millions of dollars.  Recovery Point Objective (RPO) as defined by Wikibon is “…the maximum acceptable level of data loss following an unplanned “event”, like a disaster (natural or man-made), act of crime or terrorism, or any other business or technical disruption that could cause such data loss.”  Interestingly, RPO is actually measured in units of time (typically hours or days) and represents the point in time, prior to such an event or incident, to which lost data can be recovered (given the most recent backup copy of the data).  For instance, one company may be willing to accept the loss (and subsequent manual re-creation) of an entire day’s worth of transactions, while a different company may be willing to tolerate the loss of only transactions that were in process at the very moment the system failed (typical of an online retailer).

So now that we have defined Recovery Time Objective and Recovery Point Objective, let’s look at how the strategies of implementing these have changed with the advent of Cloud Storage.  Cloud Storage providers must assume that customer’s RTO and RPO requirements have not changed and are still based upon the customer’s needs, not on what the Cloud Storage provider can support.  With traditional DR implementations, production systems would need to be replicated resulting in a very inefficient use of capital resources.  Contrast this with the evolution of a Cloud Storage provider’s DR strategy which allows the provider to leverage the efficiencies of a shared virtual data center and therefore, does not require the company to invest in additional hardware to replicate their production systems data to another physical location.  Instead, they are able to replicate their data within the same Cloud Storage pool that is providing their production data (it may be physically located on different hardware within the virtual data center)—thereby eliminating the need for a secondary/recovery location facility and all of the associated CAPEX and OPEX costs that come along with it.

Cleversafe takes this strategy one step further by eliminating the need to replicate data within the Cloud and instead uses Information Dispersal Algorithms to securely and efficiently store data with the confidence that even with multiple storage node–or even site failures–will not result in data loss.  Customers are concerned that their data may be compromised within the Cloud and worse, that they may not have access in the event of a catastrophic failure.  Many companies are trying to provide reassurance that data will not be lost when disaster strikes, but only Cleversafe is able to ensure that through true geographic dispersal, data remains highly available and at the same time completely secure.  The real challenge is for Cloud storage service providers to dispel the previously accepted method of replication and its associated costs with a much more sophisticated and efficient form of disaster recovery using dispersal algorithms.

No Wonder People get Confused about Cloud Security

Another week goes by in cloud computing with more vendor hyperbole leaving us and a few others scratching our heads. ParaScale’s recent announcement is a good example. We took the liberty of pulling a few quotes for reference:

“ParaScale also provides highly scalable data encryption that secures data without requiring storage of keys within the cloud or third-party key management. Cloud storage management functions such as replication and migration are performed on encrypted content with multiple protocols and algorithms supported.”

Source: Computer Technology Review

“With ParaScale’s keyless encryption, a user’s authentication into the back-end system generates an encryption key on the fly to write to the user’s apportioned virtual file system (VFS). A similar process will allow users secure access to reading data.”

Source: SearchStorage

“The enhanced security measures also means that data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end, Norris says.”

Source: Computer Technology Review

From our standpoint, although ParaScale claims highly secure data encryption, no details are offered regarding how the keys are generated, or where, if at all, the encryption keys are stored within ParaScale’s system.  It would be incorrect to assume encrypting one’s data is the end of the story when it comes to storing data securely.  The biggest outstanding question, and where most people go wrong in their implementation, is in how keys are managed and secured.  Without this information it is impossible to evaluate the security of ParaScale’s approach.

Since there doesn’t appear to be much information available to the public, we attempted to fill in some of the details we thought were missing from their release.  Below we consider several possibilities that fit with scant amount of information available, and analyze the pros and cons for each approach.

The first possibility is that ParaScale is generating keys randomly.  Since the process of generating encryption keys is random, decrypting the data requires recovering the same randomly generated key from some location.  If ParaScale is taking this approach, the question is how and where are these keys being secured?  Are they stored on other disks, other nodes?  What process is able to recover these keys and what stops an attacker from mimicking that process?

One possible answer is that ParaScale’s system maintains an internal private key, perhaps for each “virtual file system” which is used to encrypt the random keys and then store those encrypted keys with the encrypted data. Should this be the case, where are the private keys being stored, and what would happen to the user’s data should there be data loss affecting the private key?  If replication is used to protect the keys against loss, then each copy is another vector for attack.

A second possibility is that the keys are generated deterministically instead of randomly.  The advantage of generating keys deterministically is that no keys need to be stored anywhere.  The way this would work is that some set of information related to the write request is entered into a function to derive a key.  Then when reading the data, the same information could be available to generate the same key and decrypt the data.  For example, that information might consist of the “virtual file system id”, “the id of the node where the data is stored”, “the name of the data chunk being stored”, etc.

This seems to fit with the quote from Norris: “data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end”.  If this is their approach, it would be a little easier than creating an exact copy of the storage cluster, instead the attacker just needs to be able to predict or sufficiently narrow down the set of parameters that go into the key derivation function.  In our example, if the attacker can guess the node id, the vfs id, and the chunk name, they could trivially derive the key to decrypt the data.  Predicting this meta-data information is likely to be much easier than attempting to guess a randomly generated key.  Therefore, while much simpler than managing keys, the security of this approach suffers significantly.

Finally, there was the quote from SearchStorage that the user’s authentication into the system is what generated the key.  The third possibility is that the user’s credentials are what are used to generate the encryption key.  This is similar to the above approach, except instead of using information related to the write request to generate the key, something else, like the user’s password would be used to generate the encryption key.  The limitation here is that user’s passwords are significantly easier to brute force than a 128-bit or 256-bit encryption key.  The other downside is that you wouldn’t be able to change your password without first re-encrypting the data or keys which are protected by the old password.

It is impossible for us to evaluate the security of ParaScale’s approach without more knowledge of where the keys are stored within the system. However, this much would at least be clear: since the ParaScale system itself is generating, managing and using these keys, it seems there is likely one or more control nodes within their system which represent a single point of compromise in its system.  While their encryption may protect against theft of disk or node, the remote compromise of an on-line node would yield keys or decrypted user data.  For cloud environments, the most secure approach would be for the end-user, and only the end-user to be in charge of putting their data back together, rather than having the cloud storage provider decrypt data for them.

Cleversafe’s approach to key management

Our approach is not only clearly explained, but relies on well known and analyzed techniques for achieving data security.  Moreover, it places control of assembling and decrypting data squarely in the hands of the end-user while still avoiding the need for key management.

When we completed our 2.0 announcement that included our SecureSlice™ technology, we discussed keyless encryption. With SecureSlice™ technology a service provider cannot go through a public cloud to access customer data or the encryption keys. Only the end-customers with credentials are in control of who can actually access their data. As such, the security of data comes down to the access control mechanism each customer puts in place.

SNW Trends

Several of us attended Storage Networking World- great meeting new folks over the event. Here’s our take on themes of discussion.

Cloud Adoption

  • Cloud storage is gaining momentum – enterprises in R&D mode
  • Cloud storage is becoming standardized – CDMI v1 ratified, ANSI next
  • Object storage is being defined as preferred method for Public Cloud Services
  • Applications transitioning from traditional storage interfaces (POSIX) to APIs

Cloud Adoption

  • Security identified as top concern for Cloud adoption
  • Cost-justification is driving examination towards using service providers
  • Enterprise grade SLAs desired
  • Latent networks a concern – how to make a WAN perform like a LAN

Green / Storage Virtualization

  • Large enterprises examining how to build green data centers
  • Efficiency trend continues to virtualize server, network, and storage resources

Scalability

  • Smaller storage admin teams managing petabytes requires comprehensive systems
  • NAS protocols becoming ineffective in scale out deployments

Tiering

  • SSDs heavily dicussed for tier 0 of storage

After attending many of the presentations related to Cloud, we feel our product addresses the fundamental security concerns with having a third party store your data. More on that topic in a future post.

And, it was great meeting with Dave Vellante and team from the Wikibon Project, George Crump and Eric Slack of Storage Switzerland, and Howard Marks of Network Computing / Deep Storage.

Confirmation the future of storage is dispersal

With the recent EMC Atmos announcement, it is great to see momentum behind using Information Dispersal Algorithms or Forward Error Correction to increase data protection. An emerging number of vendors are beginning to agree with us that RAID and replication aren’t going to work for massive storage systems – particularly for storage clouds. And, we’ve always said we can’t be the only vendor out there who understands there is a better approach.

Using IDAs or FEC is the “how” of these types of new storage platforms, and tablestakes to get started. We focused our first three years on getting the “how” to work, and launched our dsNet system using IDAs into the market in February of 2008.

We’ve then spent the next two years building resiliency into our system.

Resiliency that…

  • Keeps your business running regardless of demands and risks
  • Resists hackers’ attempt at stealing your data, even if it’s not encrypted
  • The cost of threatening data is way too high
  • Can run despite a series of catastrophes causing multiple simultaneous failures of drives, devices and locations
  • Provides data security for data at rest or while in motion
  • Guarantees data integrity
  • Provides an “always on” architecture
  • Optimizes multi-site performance
  • Enables secure collaboration and communication

Let’s look at some of the features our product has today that deliver on resiliency, and discussion points on what to look for in these types of storage platforms.

SecureSlice™ Protection

Feature: Any group of slices less than the threshold is unrecognizable as data.

Benefits: Secures data in a multi-tenancy environment. Hardware breaches do not compromise data.

Discussion: Make sure data slices, fragments, or packets are completely transformed and don’t contain any original data to ensure data security.

PerfectBitsAssurance

Feature: Integrity checks on both individual slices and data files.

Benefits: Guarentees bit-perfect data storage and delivery. Guarentees data cannot be corrupted during file updates even under adverse conditions.

Discussion: Don’t rely on hardware for data integrity – make sure storage system addresses data integrity in software intelligence.

Tunable data protection

Feature: Set the Dispersed Storage to a multitude of M of N combinations.

Benefits: Match reliability to application requirements and ensure acceptable fault tolerance of storage nodes and sites.

Discussion: Make sure your solution has data protection tunability. Without tunability, fixed configurations may not map well to existing infrastructures.

Rebuilder™ Process

Feature: A distributed rebuilding process works across all storage nodes.

Benefits: No single point of failure or choke point assures highly scalable reliability.

Discussion: Make sure your solution has a distributed rebuilding approach so that storage nodes can handle rebuilding versus taxing servers also responsible for access throughput.

SmartRead™ Performance

Feature: Predicts the optimal network routes and storage nodes for reading the minimal number of slices to most efficiently return data.

Benefits: High level of read performance guaranteed even with failure conditions. Minimizes network traffic.

Discussion: Look for solutions that optimize performance for distributed storage systems by ranking the highest performing storage nodes in real-time.

Access Software Client

Feature: SDK that enables reading and writing directly to storage nodes versus having to go through an additional server.

Benefits: Avoids choke points of a gateway, enabling massive parallel reads and writes. Can be embedded directly into devices.

Discussion: Look for solutions with storage nodes so that a software client can speak directly to the storage. Solutions with servers and disk enclosures will require access through the server preventing massive parallel reads and writes.

We figure six features is enough to get started with, and we look forward to sharing many more new features to deliver resiliency with our upcoming software announcement.

Thanks to all the emerging solutions that validate our approach.