If you would like to read any other chapter’s of this blog series, click the links below:
- Part 1 – Introduction and Licensing
- Part 2 – Architecture and Hardware
- Part 3 – Data Availability
- Part 4 – Fault Domains and Stretched Clusters
- Part 5 – Failure Events
- Part 6 – Compression, Deduplication and QoS
- Part 7 – Monitoring and Reporting
This post will look at what data reduction features are available with vSAN and how this can reduce the total amount of storage required. Let’s begin with a quick review on some of the vSAN storage concepts discussed so far
- vSAN pools local storage from hosts within a cluster into a single resilient datastore
- The available capacity reported in vCenter is the RAW capacity of all capacity devices within the cluster
- Only capacity devices from count towards this total
- Caching devices do not count towards this total
- Redundancy is configured at a policy level
- Hardware RAID is not used with vSAN
- Configurations include hybrid and all-flash options
- Erasure coding reduces the amount of storage required
With the above points fresh in our minds it’s time to review some additional data reduction technologies supported with vSAN.
Deduplication and Compression
These features were introduced with the release of vSAN 6.2 in 2016. As with erasure coding these are only available with all-flash configurations. The main purpose of these types of data reduction technologies is to reduce the overall amount of physical storage capacity required. Now depending on what marketing material you read you’ll see anything from a 2x reduction up to a 7x reduction. In the real world, the amount of saving you’ll achieve very much depends on the source data. Some types of data will compress well and other types not so well.
Deduplication and compression are enabled at the cluster level and this is quite a simple process to complete – you just click a box.
Beware, if this is enabled on a cluster with existing data a rolling format of every disk is required which may take a long time. There is some good news though, you won’t need to shut all the virtual machines down first and no downtime is required.
Both deduplication and compression are applied after the incoming write has been acknowledged within the vSAN cache tier. This minimises the performance impact on the workload. The deduplication algorithm used works on a fixed 4k block at a disk group level. This means multiple copies of the same block within a disk group are reduced to one copy but block across multiple disk groups are not deduplicated. The compression algorithm is applied following the deduplication process before the data is written to the capacity tier. Whilst there is a significant overhead when using these types of data reduction features, vSAN has some built in intelligence which allows it to be very efficient. vSAN will only apply compression to a block if it can be reduced from 4k to 2k or less. If it can’t the block is written uncompressed to save wasting compute resources. Cold data held in the cache tier that is ready to be de-staged is first moved to memory where it is deduplicated and compressed. The data is then written to the capacity tier.
When using deduplication and compression there can be an impact on performance and latency. By having this functionality only available to all-flash configurations means the additional resource requirements are offset against the lower overall capacity requirements making all-flash configurations more cost effective.
Quality of Service
With any environment that utilises some form of shared resources there is always a risk that a busy workload can negatively affect the performance of another. This is commonly referred to as a “noisy neighbour”. With vSAN it is possible to place a limit on how many IOPS a virtual machine can consume to prevent it from consuming too much resource. This is configured using storage policies and don’t forget they can be applied amended on the fly with any changes being implemented straight away.
The final part in this series will look at how the health and configuration of vSAN is monitored along with where to find information such as capacity utilisation and performance statistics.
More in the series
Leave a Reply