Another week goes by in cloud computing with more vendor hyperbole leaving us and a few others scratching our heads. ParaScale’s recent announcement is a good example. We took the liberty of pulling a few quotes for reference:
“ParaScale also provides highly scalable data encryption that secures data without requiring storage of keys within the cloud or third-party key management. Cloud storage management functions such as replication and migration are performed on encrypted content with multiple protocols and algorithms supported.”
Source: Computer Technology Review
“With ParaScale’s keyless encryption, a user’s authentication into the back-end system generates an encryption key on the fly to write to the user’s apportioned virtual file system (VFS). A similar process will allow users secure access to reading data.”
Source: SearchStorage
“The enhanced security measures also means that data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end, Norris says.”
Source: Computer Technology Review
From our standpoint, although ParaScale claims highly secure data encryption, no details are offered regarding how the keys are generated, or where, if at all, the encryption keys are stored within ParaScale’s system. It would be incorrect to assume encrypting one’s data is the end of the story when it comes to storing data securely. The biggest outstanding question, and where most people go wrong in their implementation, is in how keys are managed and secured. Without this information it is impossible to evaluate the security of ParaScale’s approach.
Since there doesn’t appear to be much information available to the public, we attempted to fill in some of the details we thought were missing from their release. Below we consider several possibilities that fit with scant amount of information available, and analyze the pros and cons for each approach.
The first possibility is that ParaScale is generating keys randomly. Since the process of generating encryption keys is random, decrypting the data requires recovering the same randomly generated key from some location. If ParaScale is taking this approach, the question is how and where are these keys being secured? Are they stored on other disks, other nodes? What process is able to recover these keys and what stops an attacker from mimicking that process?
One possible answer is that ParaScale’s system maintains an internal private key, perhaps for each “virtual file system” which is used to encrypt the random keys and then store those encrypted keys with the encrypted data. Should this be the case, where are the private keys being stored, and what would happen to the user’s data should there be data loss affecting the private key? If replication is used to protect the keys against loss, then each copy is another vector for attack.
A second possibility is that the keys are generated deterministically instead of randomly. The advantage of generating keys deterministically is that no keys need to be stored anywhere. The way this would work is that some set of information related to the write request is entered into a function to derive a key. Then when reading the data, the same information could be available to generate the same key and decrypt the data. For example, that information might consist of the “virtual file system id”, “the id of the node where the data is stored”, “the name of the data chunk being stored”, etc.
This seems to fit with the quote from Norris: “data on stolen discs or nodes can’t be accessed unless the thief has the dedication to make an exact copy of your storage cluster on their end”. If this is their approach, it would be a little easier than creating an exact copy of the storage cluster, instead the attacker just needs to be able to predict or sufficiently narrow down the set of parameters that go into the key derivation function. In our example, if the attacker can guess the node id, the vfs id, and the chunk name, they could trivially derive the key to decrypt the data. Predicting this meta-data information is likely to be much easier than attempting to guess a randomly generated key. Therefore, while much simpler than managing keys, the security of this approach suffers significantly.
Finally, there was the quote from SearchStorage that the user’s authentication into the system is what generated the key. The third possibility is that the user’s credentials are what are used to generate the encryption key. This is similar to the above approach, except instead of using information related to the write request to generate the key, something else, like the user’s password would be used to generate the encryption key. The limitation here is that user’s passwords are significantly easier to brute force than a 128-bit or 256-bit encryption key. The other downside is that you wouldn’t be able to change your password without first re-encrypting the data or keys which are protected by the old password.
It is impossible for us to evaluate the security of ParaScale’s approach without more knowledge of where the keys are stored within the system. However, this much would at least be clear: since the ParaScale system itself is generating, managing and using these keys, it seems there is likely one or more control nodes within their system which represent a single point of compromise in its system. While their encryption may protect against theft of disk or node, the remote compromise of an on-line node would yield keys or decrypted user data. For cloud environments, the most secure approach would be for the end-user, and only the end-user to be in charge of putting their data back together, rather than having the cloud storage provider decrypt data for them.
Cleversafe’s approach to key management
Our approach is not only clearly explained, but relies on well known and analyzed techniques for achieving data security. Moreover, it places control of assembling and decrypting data squarely in the hands of the end-user while still avoiding the need for key management.
When we completed our 2.0 announcement that included our SecureSlice™ technology, we discussed keyless encryption. With SecureSlice™ technology a service provider cannot go through a public cloud to access customer data or the encryption keys. Only the end-customers with credentials are in control of who can actually access their data. As such, the security of data comes down to the access control mechanism each customer puts in place.


