In a recent post we discussed some drawbacks of encryption. We received a lot of feedback, both negative and positive.
This is the final post in a 3 part series. In the first post we discussed how today’s encryption systems may not provide the necessary security for long term storage applications and how the benefits of dispersal remain regardless of advances in computing or mathematical theory. The second post explored some of the complexities of key management and described Cleversafe’s answer to the problem: a new technique for storing data efficiently with the confidentiality and availability guarantees of secret sharing schemes.
In today’s post we will analyze exactly how data exposures happen in the real world, what the ramifications are both for the company responsible and the customers whose private data is lost, and finally what dispersal can offer to mitigate the threat.
Paths to Data Exposure
A study titled Using Science to Combat Data Loss: Analyzing Breaches by Type and Industry looked at 899 different data breaches over the years 2005-2008 and classified each breach by its type. They developed a classification system which consists of three top-level categories: Physical, Logical and Procedural. Each category is further sub-divided as shown below:
Physical (42.94%)
- Docs (4.22%) – Compromise of physical non-electronic media such as paper documents
- Media (9.57%) – Includes theft or loss of electronic media such as flash drives, harddrives, and tape
- Hardware (29.14%) – Loss of a complete computer, such as a PDA, Laptop or Server
Logical (27.80%)
- Insider Action (6.23%) – Action by a customer, student, employee, contractor or partner
- Compromise (21.58%) – Compromise of data by exploitation of a vulnerability in a system
Procedural (29.25%)
- Processing Error (22.14%) – Legitimate business activity leading to exposure of data, e.g., accidentally putting a sensitive file on public web site
- Disposal (7.12%) – Careless throwing away or abandonment of sensitive information, not wiping a disk or shredding a document before disposal
Since laws have been passed mandating that companies report data breaches, over 262,000,000 personal records have been reported exposed. There is a high probability that you are counted among those whose personal information has been lost. I know that I am. The high frequency at which these breaches occur is an inherent property of the way data is traditionally stored and backed up. As you will see later in this post, there is a new and better way of storing data which can greatly diminish the risk of exposure.
Consequences of a Data loss
The consequences of a data exposure can be dire, for both the business and the individual whose data was compromised. For the business, a loss can lead to civil liability, as was the case when ChoicePoint settled for $10 million dollars in civil penalties and $5 million for consumer redress over the loss of 163,000 records leading to 800 cases of identity theft. It can also lead to fines, loss of confidence, loss of business and in general a public relations nightmare.
Even if one’s business does not retain customer information every businesses has confidential data of some form or another, be they trade secrets, intellectual property, financial records, employee information, customer lists or other critical data which would be devastating were it to fall into the wrong hands. Therefore every business should have a strong motivation to reduce the risk of data breaches.
The business isn’t the only entity impacted by an exposure. For the individual having one’s personal information exposed can be a nightmare. It can lead to identity theft, damage to one’s credit rating, having the inconvenience of a frozen credit report and what’s worse, these credit-related issues can have effects lasting for years. There is also the violation of one’s privacy and confidence. Few would be happy with the news that their medical, financial, or academic records were exposed. As our lives become increasingly information driven, the more we will stand to lose from such breaches, as databases become larger, more detailed and more common.
How Dispersal Helps
Dispersal reduces the threat of data exposure for all seven types of breaches identified by the study. This is possible because of the inherent confidentiality properties which are unique to dispersal and other secret sharing schemes. Let us explore how dispersal helps mitigate or eliminate altogether the risk of exposure for each type of breach.
- Docs: Information Dispersal deals only with digital representations of information. Therefore the loss of physical documents containing sensitive information does not apply to dispersed storage.
- Media: Often back-up media are lost during transportation to an off-site location. Dispersal avoids this problem altogether by using cryptographically secure Internet connections to distribute slices of data to different sites. No single site contains a complete copy of the data and therefore there is no opportunity for media containing a backup to be intercepted. If a harddrive, or even all the harddrives from a single site were stolen they would contain no usable information due to the properties of dispersal. The only way an attacker could physically compromise the system would be to physically compromise media from a threshold number of sites, each of which may be separated by geographical distances. Few systems in the world are as difficult to physically compromise as dispersed storage systems.
- Hardware: As is the case of individual harddrives, individual servers containing dispersed data are useless without possessing a threshold number of them, which must be taken from many different physical locations. If the attacker fails to procure the necessary number of machines then they can obtain no useful information because the slices stored on the machines are meaningless with anything short of the threshold. For details on how this technology works, please see the previous posting.
- Insider Action: Every imaginable system will have some vulnerability to insider action, because the data needs to be used somehow. Therefore the ideal system would provide some guarantee that only those with the right authorization could lead to insider breaches. A study by Verizon found that roughly half of insider abuses were not on the part of the users with proper authorization to the data but by system administrators. By using dispersal, one can eliminate the threat that administrators add to the equation. To achieve this no one administrator should have access to a threshold number of servers from the same set. For example if data is dispersed to 4 locations, administrators at each site could be given permission to access only to those machines at the site where they work. In this way, no single administrator can willingly or accidentally cause the data to be exposed. Instead it would take a conspiracy of administrators working together, and this is much more rare than a lone disgruntled administrator.
- Compromise: As was the case for physical attacks on the hardware, the same threshold requirement exists for compromise: Multiple machines must be compromised for there to be any threat of an exposure. Therefore any successful attack against a dispersed storage system must be targeted, well coordinated and manual. Additionally, a threshold number of machines must also have an unpatched remotely exploitable vulnerability. The Verizon study found that in 2008, organized criminal activity accounted for 91% of compromises. Since there is usually a profit motive behind these attacks, organized criminals will seek the easier targets over harder ones. Therefore the increased technical and logistical difficulty of compromising dispersed data will serve as a deterrent given the availability of lower-hanging fruit.
- Processing Error: It is common for systems to suffer from an occasional misconfiguration leading to accidental exposure. With dispersal, multiple systems could simultaneously be misconfigured or otherwise made insecure without causing a breach of the data. In traditional storage systems, one often doesn’t realize there is a misconfiguration until it is too late, with the majority of exposures being noticed and reported by third-parties. Dispersal allows such mistakes to be noticed and corrected before the loss of data occurs.
- Disposal: While proper wiping is still recommended for media containing dispersed data, if one did make the mistake of throwing out hard drives that stored dispersed data it wouldn’t be possible to obtain any useful information without gathering disposed drives from many different locations. If someone came across a dumpster filled with harddrives all from the same dispersed storage site, those disks alone would be insufficient for extracting any useful data.
Over 75% of states now have laws requiring notification in the event of a data exposure. Additionally there have been a few federal bills(S 1789 and HR 4127) which would standardize the requirements for disclosure. As currently proposed, Congress would require a notice unless “no risk” exists. Therefore if one uses dispersal to store their data and by some mistake one or two slice servers go missing one would not have to make a disclosure notice, since there is no risk of data exposure so long as less than a threshold number of servers have been compromised.
Currently all existing state laws treat encryption as a safe harbor, i.e., if encrypted data is lost a company doesn’t have to report it. However two of the proposed federal bills do away with the encryption safe harbor. Dispersal is a new technology which unlike encryption does not rely on the security of a key stored somewhere else, the slices of the data itself are the keys. Therefore under the bills as written, dispersal may be considered a safe harbor when encryption is not. By using dispersal, a company not only reduces the risk of exposure but also would avoid having to make embarrassing admissions each time a single machine is compromised.
This concludes the three part series regarding the limitations of existing encryption in existing storage systems. We hope that you enjoyed reading the posts and that you learned something new and interesting. Our next post will be on the topic of reliability and how to calculate it for highly fault-tolerant dispersed storage systems.

Interesting findings. The former two areas are well known and have [somewhat decent] countermeasures in place.
I’d love to hear what others are doing about the Procedural area.
This is a much harder area to address and only see a few resources like FAIR (a risk management framework) to address this universally.
If the biggest single area of data compromise is from the Loss of a complete computer, such as a PDA, Laptop or Server (29.14%) shouldn’t a way to remotely scrub these devices be found. If the lost PDA, Laptop or Server ever went online again it would announce itself and the sensitive data could be erased or destroyed or even recovered and removed.
That approach can work, but it is hardly foolproof. Remote scrubbing assumes that the device will again be connected to the network and allowed to phone home, and that the disk or flash storage device is not taken out and analyzed. For integrated devices with wireless or cellular Internet connections this approach can work a large percentage of the time, but the only way to be 100% sure they get nothing from stealing the hardware is to have the data on said device useless without some additional piece of information, be it a key, passphrase, or other slices.