Rebuilding times are a hot topic particularly in light of the industry adoption of 1TB drives, and pending 2TB drives. The problem is that as storage density increases, the likelihood of encountering a Unrecoverable Read Error (URE) has also increased since URE’s have not seen improvement from 1/10e14 – 1/10e15. (Read RAID’s days may be numbered for more details)
Rebuilding concerns revolve around two key issues:
- Am I vulnerable to data loss while the disk is rebuilding?
- Can I balance rebuilding time with typical I/O demands on the system?
Rebuilding with RAID
RAID typically stores data in arrays of drives, and the bottleneck for rebuilding is the read/write speed of the drives. If a drive fails, the data is rebuilt from reading the parity data stored on the remaining drives. All of the drives are within the same storage appliance. (See figure 1a)
Based on many published studies on RAID rebuilding times, it would typically take 1-3 days to rebuild a 1 terabyte drive with an idle system. The rebuilding time would be 3-5 times that with a heavier I/O activity load.
Rebuilding using Dispersal
The main difference between rebuilding with RAID versus Dispersal is that RAID stripes its data across multiple drives in a single hardware appliance, whereas Dispersal disperses the Slices across multiple drives in multiple hardware appliances. When rebuilding a drive, a Slice stored on the failed drive is rebuilt from reading a threshold number of Slices from the remaining Slicestor appliances. (See figure 1b)
With Dispersal, the network speed plays an important role in rebuilding as the network can introduce a potential bottleneck into the rebuild process. This means that typically rebuilding using Dispersal may actually take longer if the network has insufficient throughput to write to the disk at full speed.

Comparing RAID and Dispersal Data Protection
In a 16 wide, 10 threshold Dispersal configuration (16/10), data is sliced and dispersed across 16 Slicestor appliances, with only any 10 required to perfectly recreate the data. So, up to six simultaneous appliance failures can occur without data loss. With the 16 storage appliances stored across 4 geographically dispersed locations, it would tolerate both an entire site failure, as well as 2 additional appliance failures, while still providing seamless access to data.
In a typical RAID 6 configuration of eight drives with two dedicated for parity, only two simultaneous errors can occur. Any further error (drive failure, or URE) will result in data loss.
Comparing RAID 6 and Dispersal 16/10, Dispersal 16/10 could encounter four simultaneous errors to effectively be at the same starting point in terms of data protection of a healthy RAID 6 system – both would be able to tolerate two simultaneous errors.
As this example shows, Dispersal can tolerate three times the number of simultaneous errors which points to why rebuilding times are less relevant. After losing two drives, the white knuckled rebuild with RAID 6 isn’t a pressing concern with Dispersal, since four additional errors could occur – which is statistically unlikely.
Comparing the years without data loss for a 1 petabyte system, (see Figure 2), Dispersal can tolerate much longer rebuild times while still delivering higher levels of data protection than RAID 5 or RAID 6.
For example, if RAID 6 rebuilding took 10 hours, Dispersal can tolerate over 6000 rebuilding hours while providing an equal level of data protection. This is an illustrative example only, clearly rebuilding would be prioritized to occur in a shorter time period.

Figure 2
RAID rebuilding performance
With RAID rebuilding, typically there is a choice of how much performance degradation is acceptable while rebuilding the drive. Other factors affecting the rebuild time include RAID stripe size, drive size, number of drives, and drive capacity.
When setting rebuild priority, the tradeoff is between using system resources for rebuilding and for other I/O activities. When rebuilding takes precedence, faster rebuild times will result, but no other activities can occur potentially resulting in lost business productivity. When other I/O activities are prioritized, longer rebuild times occur, rebuilding may only occur during off peak hours, and data may be vulnerable to loss if additional errors are encountered.
Dispersal rebuilding performance
Looking back to the two issues when rebuilding – data protection, and balancing rebuilding with work productivity – Dispersal effectively addresses both concerns. Dispersal provides extremely high levels of data protection since it is fault tolerant by design.
Regarding work productivity, IT staff can simply dedicate more system resources to other I/O activities, and have rebuilding prioritized during off peak hours. This may seem counter-intuitive since this is suggesting making the rebuild times even longer. It’s not though because the data protection levels are so much higher than with RAID.
Dispersal rebuilds only data
Dispersal also performs its rebuilding differently than most RAID systems. Typically when a drive replaced in a RAID array, the rebuild process rebuilds the entire drive regardless of whether there is actually data stored on it. This means the rebuild time is longer than necessary.
Dispersal, on the other hand, uses both CRC values on reads, as well as a background scrubbing process to determine data that needs to be rebuilt. Further, when a drive is replaced, a scan is performed to determine how much data was actually stored, and only rebuilds actual data. This shortens the rebuild time when compared to rebuilding an entire drive. It should be noted that dispersal rebuilding requires reading more data per restored byte then a RAID system.
Conclusion
Dispersal is much more fault tolerant than RAID 5 or RAID 6, and isn’t as sensitive to needing the fast rebuild times that RAID requires. As such, rebuilding can occur without significant performance degradation or risk to data integrity.