Tag Archive for 'key management'

No Wonder People get Confused about Cloud Security

Another week goes by in cloud computing with more vendor hyperbole leaving us and a few others scratching our heads. ParaScale’s recent announcement is a good example. We took the liberty of pulling a few quotes for reference:

“ParaScale also provides highly scalable data encryption that secures data without requiring storage of keys within the cloud or third-party key management. Cloud storage management functions such as replication and migration are performed on encrypted content with multiple protocols and algorithms supported.”

Source: Computer Technology Review

“With ParaScale’s keyless encryption, a user’s authentication into the back-end system generates an encryption key on the fly to write to the user’s apportioned virtual file system (VFS). A similar process will allow users secure access to reading data.”

Source: SearchStorage

“The enhanced security measures also means that data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end, Norris says.”

Source: Computer Technology Review

From our standpoint, although ParaScale claims highly secure data encryption, no details are offered regarding how the keys are generated, or where, if at all, the encryption keys are stored within ParaScale’s system.  It would be incorrect to assume encrypting one’s data is the end of the story when it comes to storing data securely.  The biggest outstanding question, and where most people go wrong in their implementation, is in how keys are managed and secured.  Without this information it is impossible to evaluate the security of ParaScale’s approach.

Since there doesn’t appear to be much information available to the public, we attempted to fill in some of the details we thought were missing from their release.  Below we consider several possibilities that fit with scant amount of information available, and analyze the pros and cons for each approach.

The first possibility is that ParaScale is generating keys randomly.  Since the process of generating encryption keys is random, decrypting the data requires recovering the same randomly generated key from some location.  If ParaScale is taking this approach, the question is how and where are these keys being secured?  Are they stored on other disks, other nodes?  What process is able to recover these keys and what stops an attacker from mimicking that process?

One possible answer is that ParaScale’s system maintains an internal private key, perhaps for each “virtual file system” which is used to encrypt the random keys and then store those encrypted keys with the encrypted data. Should this be the case, where are the private keys being stored, and what would happen to the user’s data should there be data loss affecting the private key?  If replication is used to protect the keys against loss, then each copy is another vector for attack.

A second possibility is that the keys are generated deterministically instead of randomly.  The advantage of generating keys deterministically is that no keys need to be stored anywhere.  The way this would work is that some set of information related to the write request is entered into a function to derive a key.  Then when reading the data, the same information could be available to generate the same key and decrypt the data.  For example, that information might consist of the “virtual file system id”, “the id of the node where the data is stored”, “the name of the data chunk being stored”, etc.

This seems to fit with the quote from Norris: “data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end”.  If this is their approach, it would be a little easier than creating an exact copy of the storage cluster, instead the attacker just needs to be able to predict or sufficiently narrow down the set of parameters that go into the key derivation function.  In our example, if the attacker can guess the node id, the vfs id, and the chunk name, they could trivially derive the key to decrypt the data.  Predicting this meta-data information is likely to be much easier than attempting to guess a randomly generated key.  Therefore, while much simpler than managing keys, the security of this approach suffers significantly.

Finally, there was the quote from SearchStorage that the user’s authentication into the system is what generated the key.  The third possibility is that the user’s credentials are what are used to generate the encryption key.  This is similar to the above approach, except instead of using information related to the write request to generate the key, something else, like the user’s password would be used to generate the encryption key.  The limitation here is that user’s passwords are significantly easier to brute force than a 128-bit or 256-bit encryption key.  The other downside is that you wouldn’t be able to change your password without first re-encrypting the data or keys which are protected by the old password.

It is impossible for us to evaluate the security of ParaScale’s approach without more knowledge of where the keys are stored within the system. However, this much would at least be clear: since the ParaScale system itself is generating, managing and using these keys, it seems there is likely one or more control nodes within their system which represent a single point of compromise in its system.  While their encryption may protect against theft of disk or node, the remote compromise of an on-line node would yield keys or decrypted user data.  For cloud environments, the most secure approach would be for the end-user, and only the end-user to be in charge of putting their data back together, rather than having the cloud storage provider decrypt data for them.

Cleversafe’s approach to key management

Our approach is not only clearly explained, but relies on well known and analyzed techniques for achieving data security.  Moreover, it places control of assembling and decrypting data squarely in the hands of the end-user while still avoiding the need for key management.

When we completed our 2.0 announcement that included our SecureSlice™ technology, we discussed keyless encryption. With SecureSlice™ technology a service provider cannot go through a public cloud to access customer data or the encryption keys. Only the end-customers with credentials are in control of who can actually access their data. As such, the security of data comes down to the access control mechanism each customer puts in place.

Response Part 2: Complexities of Key Management

In a recent post we discussed some drawbacks of encryption.  We received a lot of feedback, both negative and positive. Now we’d like to explore some of the math behind that claim and clarify some of the points we made.

This is the second post in a 3 part series. We will discuss the following points in these three posts:

  1. Today’s encryption systems are designed to provide a sufficient level of security for protecting against today’s threats. These systems are not designed to provide security for long term storage over decades. The security benefits of Dispersal are more resistant to advances in computing power and mathematical discoveries.
  2. Today’s storage systems generally make a trade-off between confidentiality and availability. Systems which accomplish both do exist, but often at the expense of low storage efficiency. Dispersal allows the user to achieve high levels of confidentiality and availability while minimizing storage inefficiency.
  3. Given current laws, companies must disclose data loss regardless of whether data was encrypted or not. Dispersal changes the conversation because a typical loss event will result in the original data being mathematically impossible to recover from any lost components.

Confidentiality vs. Availability


When storing important confidential data, one has two goals:

  • Availability: The data should always be available to the authorized entities, and never be lost
  • Confidentiality: The data should be kept private and inaccessible to unauthorized entities

So simultaneously we must keep the data as available as possible to the right individuals while also keeping it as unavailable as possible to the wrong individuals. Traditionally, confidentiality is achieved using encryption. By encrypting the data all efforts toward confidentiality can focus on the key, because encrypted data remains private to someone who doesn’t have the key.

One could stop here, having achieved a decent level of confidentiality but only if one doesn’t care about losing the data. In the current state, such data would be highly vulnerable to loss. If the hard drive, CD-ROM, thumb drive or whatever media storing the key is lost, breaks or becomes corrupted then the encrypted data will remain forever irretrievable. Likewise any similar problem with the media storing the encrypted data will cause a loss of data.

One way to deal with the threat of loss in this situation is to make multiple copies. Backup the encrypted data to tape, or to another off-site location. Likewise store multiple copies of the key in different locations to prevent the failure or loss of any one device from causing a loss of data. In doing so one will have achieved a decent level of availability.

In making all these copies to maintain availability, what have we done to our confidentiality? We now have multiple copies of encrypted data, and multiple copies of the key. Each copy represents another attack vector for an adversary, and another thing which attention must be focused on protecting. The more we try to enhance availability the worse things become for confidentiality.

If only we could have the best of both worlds: a high level of availability AND confidentiality.

Secret Sharing


Secret Sharing schemes promise just that: high availability and confidentiality. They have been known for a long time, having been invented independently by Adi Shamir and George Blakley in 1979, and some key management systems use it for storing keys.

The basic operation of a secret sharing scheme is to take a secret, in this case a key, and create some number of shares from it. To recover the secret, some user-defined minimum number of shares must be used in the calculation. This provides high availability; a number of shares can be lost and so long as you can obtain the minimum number required, you can get the key back. For example, lets say we created 5 shares for a key, and made the minimum number required for recovery 3. In such a case we can tolerate the loss of any two shares, much as if we had made 3 copies but without the associated loss in confidentiality.

In fact, the security of storing the key as shares greatly enhances the confidentiality, beyond that of storing a single copy of a key. Again using the 5/3 (5 shares, 3 needed) example, if a single share is exposed, stolen, or otherwise compromised the privacy of the key is maintained. Even if two shares are simultaneously revealed, the confidentiality of the data is maintained, because the threshold of 3 is not met. The practicality for an adversary to locate and steal multiple shares from potentially different locations is much harder to pull off than gaining access to a single key at a single location.

Information Theoretic Security

Another benefit of secret sharing schemes is that they provide information theoretic security. This means that with less than the threshold number of shares, there simply isn’t enough information available to figure out what the secret is, even if one had infinite computing power or time to try to crack it. To provide this level of security, however, comes at a cost: Every share needs to be the same size as the original secret. This is generally not a problem when storing something small like a key, which is usually less than a Kilobyte in size, however it makes secret sharing schemes impractical for bulk data storage.

To see why, imagine you are designing a secure storage system which needs to store 500 GB of data. You learned of secret sharing systems and want to use one to securely store that 500 GB of data. Lets say you decide to create 5 shares of which any 3 are needed to recover the data. To do this will require buying 2,500 GB of storage, since each of the 5 shares will be of equal size to the data being stored. Therefore you have a 5-fold increase in storage requirements and therefore a 5-fold increase in the cost of the system.

Wouldn’t it be great if there were a way to store information efficiently while keeping the availability and confidentiality of secret sharing schemes?

Dispersed StorageTM Technology


Dispersal, when combined with an All-or-Nothing Transform (AONT) makes such a thing possible. To achieve this level of efficiency requires that we abandon information theoretic security, but the practical benefits of information theoretic security are minor.  One-Time-Pad encryption is a type of encryption which is provably unbreakable, because it provides information theoretic security, but like secret sharing, its theoretical benefits simply don’t outweigh its practical costs when used for bulk data encryption.  Given that the level of security provided by the AONT can be set arbitrarily high (there is no limit to the length of key it uses for the transformation), information theoretic security is not necessary as one can simply use a key so long that it could not be cracked before the stars burn out.

By dropping the requirement for information theoretic security we can achieve the highest theoretically possible efficiency for the given level of fault tolerance. Cleversafe is unique in providing information dispersal with an AONT and was the first to make secret-sharing-like systems for efficient bulk data storage. Unlike shares, slices are only a fraction of the size of the original data, specifically they are 1/threshold the size. If we again look at the 5/3 example, instead of storing 5 shares each of which is 500 GB, we would instead store 5 slices, each of which is (500/3) or 167 GB. Therefore the total storage requirements would be 5*167 or 835 GB, this is 1/3rd the cost of the 2,500 GB system.

The availability and confidentiality benefits of dispersal become even greater the wider the dsNet. A common configuration we recommend is 16/10, that is 16 slices, with a threshold of 10. This setup has greater availability than say the 5/3 example because we can tolerate the loss of 6 slices simultaneously, compared with 2 for the 5/3 setup. Furthermore it becomes more confidential because an attacker would need to compromise 10 slices, not just 3. Surprisingly, the system is even more efficient, a 16/10 configuration would need 16 50 GB slices, so it needs only 800 GB not 835. Note that this is less costly than creating just a single copy which would require a total of 1,000 GB.

Fewer Headaches

With Dispersed StorageTM Technology there are fewer headaches than with existing key management systems. This is true for a number of reasons. The first is that one no longer has to worry about having to trade availability for confidentiality, or vice versa. One need not choose which is more important; both are important, and the storage system should reflect that fact. Dispersal provides the high availability and confidentiality of secret sharing schemes, and it can do it for your data directly. No separate system for storing keys is needed, it would be superfluous and only hurt availability.

Availability will always be detrimentally affected by adding a key management system, because it is one more thing which can fail. If the key management system fails and you lose your keys, then you’ve also lost all your encrypted data.  Short of replicating the key many times or storing shares in many locations, existing key management systems cannot match the availability that information dispersal provides.  Therefore the key management system will be the weakest link in the chain and the more likely path to data loss.

The All-or-Nothing Transform merges the key and the data such that only one storage system is required, it meets both the confidentiality and availability requirements.  The AONT is also not vulnerable to advances in math or quantum computers as RSA and ECC are, nor does it at any point rely on passwords which can be easily cracked or forgotten.

Diagrams

Encode Steps:

aont-diagram1

All-Or-Nothing Transform

  1. Generate random symmetric key: R
  2. Encrypt data
  3. Calculate hash of encrypted data: H
  4. Append (H XOR R) to the encrypted data

Information Dispersal (configured with N/K)

  1. Add padding to data up to next multiple of K bytes (using PKCS5)
  2. Divide padded data into K equal pieces, these are the first K slices
  3. Using Reed-Solomon codes, compute (N-K) additional slices to provide forward error correction
  4. Send each of the N slices to a different storage node

Decode Steps:

aont-diagram-decode1

Information Dispersal (configured with N/K)

  1. Request slices from any K of N storage nodes
  2. Using Reed-Solomon codes, compute any of the first K slices that are missing
  3. Concatenate the first K slices in the correct order
  4. Remove padding from the concatenated data

All-or-Nothing Transform

  1. Strip the appended masked key from the end of the data
  2. Calculate the hash of the encrypted data: H
  3. XOR the hash with the masked key to recover the random key: R
  4. Use the key to decrypt the data

Explanation

Ordinarily the availability world be hurt by using an All-or-Nothing transform and storing just the divided pieces in different locations, this is where the IDA comes to the rescue.  By creating additional code slices we can recover from situations in which some of the data slices are missing or otherwise not available.  Just how many are required is determined by the user-configurable threshold, which in the above diagrams is 5.  So so long as any 5 are still recoverable then the entirety of the All-Or-Nothing Transformed data can be recovered, and thus the original data can be as well.

Looking at how the decode operation works, one can see the criticality in having ALL of the data.  Without all of the data in its entirety one cannot compute the digest of the encrypted data, and therefore one cannot un-mask the random key.  Without knowledge of the random key NOTHING of the original data can be decrypted.

Costs

It may seem magical that all these goals can be achieved, but the benefits are not without cost.  Dispersed StorageTM Systems have requirements beyond those of most existing storage systems.  To achieve the full benefits of dispersal requires infrastructure: multiple sites, high bandwidth connections, at least as many computers as the IDA’s configured width, memory for managing TCP connections, and CPU power for encoding and decoding slices.  For these reasons dispersal may not be an option for everyone, however if one has massive storage requirements, desires the utmost reliability or security, or already has the existing infrastructure then dispersal may be the ideal storage solution.

Please stay tuned for the final blog posts in this series where we will explore disclosure laws and their impact on businesses.

3 Reasons Why Encryption is Overrated

UPDATE 7-31-09:
This post caused a great deal of controversy.  Some readers left with the impression that we believe encryption to be obsolete or unnecessary.  That was not our intended message; rather it was to expose common problems with conventional approaches to data encryption and what dispersal offers to address them.  Other readers disagreed with the veracity of our claims, which is not surprising given that the post lacked technical details to backup them up.  To provide technical details in defense of the claims made in this post, we have written three follow up responses: Part 1, Part 2, and Part 3 which we invite you to see.

When it comes to storage and security, discussions traditionally center on encryption.  The reason encryption – or the use of a complex algorithm to encode information – is accepted as a best practice rests on the premise that while it’s possible to crack encrypted information, most malicious hackers don’t have access to the amount of computer processing power they would need to decrypt information.

But not so fast.  Let’s take a look at three reasons why encryption is overrated.

1) Future processing power

While processing power today may keep encrypted files (that are stored in the cloud, for example) safe, as processing power improves, archived encrypted files will require systematic re-encryption to remain safe from potential hackers. Systematic re-encryption, though, is difficult, laborious and expensive.

2) Key management

To decode the encrypted files, a user needs the encryption key.  Unfortunately, managing a large number of encryption keys can be painful. Yes, there are enterprise key management (EKM) solutions that promise the ability to manage and change keys throughout their life cycle – but these serve more as a band-aid to the fundamental pain of dealing with numerous keys. As a chain is only as strong as its weakest link, an enterprise key manager is only as good as the integrated key management systems that use it. If any system downstream from a secure key manager exposes the key, or is not designed to cover a certain threat, the whole thing becomes not secure.

3) Disclosure laws

Beyond technology, breach disclosure laws  — that require organizations to notify individuals when personal information has been or at least is reasonably believed to have been acquired by an unauthorized entity – can result in a PR nightmare for a business that encryption can’t resolve.  A quick visit to Privacy Right Clearinghouse lists the compilation of data breaches since 2005 that expose individuals to identity theft as well as breaches that qualify for disclosure under state laws.  Not a short list.

A technologist with a good understanding of encryption methods may be comfortable with some of the breaches or data losses reported due to the strengths of the encryption.  But this doesn’t matter in the court of public opinion; once data – encrypted or not – is lost, so is the trust of the general public.  Encryption is simply not enough to counter business concerns about the security of their data.

Consider Dispersal

With full disclosure – Cleversafe’s storage solution is based on Dispersal – consider its security benefits. Dispersed Storage technology divides data into slices, which are stored in different geographies.  Each slice contains too little information to be useful but any threshold can be used to recreate the original data.  Translation – a malicious party cannot recreate data from a slice, or two, or three, no matter what the advances in processing power.  And Dispersal does not require the time and energy of re-encryption to sustain data protection.

Maybe encryption alone is “good enough” in some cases now  – but Dispersal is “good always” and represents the future.