100%(1)1 out of 1 people found this document helpful
This preview shows page 10 - 12 out of 26 pages.
The replication scheme guarantees such reads always see the same data. 2. Iterate all records sequentially in a stream on partition load.Each partition has two additional streams (metadata and commit log). These are the only streams that the partition layer will read sequentially from a starting point to the very last record of a stream. This operation only occurs when the partition is loaded (explained in Section 5). The partition layer ensures that no useful appends from the partition layer will happen to these two streams during partition load. Then the partition and stream layer together ensure that the same sequence of records is returned on partition load. At the start of a partition load, the partition server sends a “check for commit length” to the primary EN of the last extent of these two streams. This checks whether all the replicas are available and that they all have the same length. If not, the extent is sealed and reads are only performed, during partition load, against a replica sealed by the SM. This ensures that the partition load will see all of its data and the exact same view, even if we were to repeatedly load the same partition reading from different sealed replicas for the last extent of the stream. 4.4Erasure Coding Sealed Extents To reduce the cost of storage, WAS erasure codes sealed extents for Blob storage. WAS breaks an extent into Nroughly equal sized fragments at block boundaries. Then, it adds Merror correcting code fragments using Reed-Solomon for the erasure coding algorithm . As long as it does not lose more than Mfragments (across the data fragments + code fragments), WAS can recreate the full extent. Erasure coding sealed extents is an important optimization, given the amount of data we are storing. It reduces the cost of storing data from three full replicas within a stamp, which is three times the original data, to only 1.3x –1.5x the original data, depending on the number of fragments used. In addition, erasure coding actually increases the durability of the data when compared to keeping three replicas within a stamp. 4.5Read Load-BalancingWhen reads are issued for an extent that has three replicas, they are submitted with a “deadline” value which specifies that the read should not be attempted if it cannot be fulfilled within the deadline. If the EN determines the read cannot be fulfilled within the time constraint, it will immediately reply to the client that the deadline cannot be met. This mechanism allows the client to select a different EN to read that data from, likely allowing the read to complete faster. This method is also used with erasure coded data. When reads cannot be serviced in a timely manner due to a heavily loaded spindle to the data fragment, the read may be serviced faster by doing a reconstruction rather than reading that data fragment. In this case, reads (for the range of
the fragment needed to satisfy the client request) are issued to all fragments of an erasure coded extent, and the first Nresponses are used to reconstruct the desired fragment.