AWS vs Azure vs Google Cloud Platform - Storage & Content Delivery
In this series, we're comparing cloud services from AWS, Azure and Google Cloud Platform. A full breakdown and comparison of cloud providers and their services are available in this handy poster.
We have assessed services across three typical migration strategies:
- Lift and shift - the cloud service can support running legacy systems with minimal change
- Consume PaaS services - the cloud offering is a managed service that can be consumed by existing solutions with minimal architectural change
- Re-architect for cloud - the cloud technology is typically used in solution architectures that have been optimised for cloud
And have grouped services into 10 categories:
- Compute
- Storage & Content Delivery
- Database
- Analytics & Big Data
- Internet of Things
- Mobile Services
- Networking
- Security & Identity
- Management & Monitoring
- Hybrid
In this post we are looking at...
Storage & Content Delivery
With any cloud deployment it is important to match the right storage solution to each workload.
AWS
Amazon's native cloud object storage solution is S3, offering flexible, and cost effective storage. S3 comes with high availability and automatic replication across regions, although this can be downgraded to local redundancy to save on costs. It allows users to store objects up to 5TB in size across 3 storage classes: Standard, Infrequently Accessed (IA) and Glacier. Both Standard and IA classes offer low latency with IA having a lower storage cost but higher ingress/egress request charges. Glacier on the other hand has very low storage costs but has relatively high ingress/egress costs; additional restore charges may also apply depending on the size of the data being restored so predicting costs can be tricky. Also, since Glacier is designed specifically for data archives it comes with a high latency of up to 4 hours to receive the first byte for data retrievals. A really nice feature of S3 is automatic object versioning, each version is addressable so can be retrieved at any time.
It is possible to create file share through Elastic File System. EFS can be mounted on Linux EC2 instances via the NFS v4.1 protocol, making it a good fit for Linux lift-and-shift deployments. Storage capacity is automatically scaled based on usage, but NFS has some issues in tricky corner cases (see comments below.)
Organisations can choose to extend their on-premise storage to AWS through Storage Gateway. It is possible to mount volumes via iSCSI and have Storage Gateway manage data retrieval for you. Data is stored in S3 with data that is frequently accessed cached locally. It's also possible to keep all data locally with backup snapshots sent to S3.
Cloud migrations often require large volumes of on-premise data to be shipped to the cloud; however, moving petabytes of data over the internet can be slow and expensive. To solve this AWS offers Import/Export Snowball. Amazon will ship a physical appliance that can be plugged into the network and used to securely copy and transport large volumes of data.
Finally, AWS also offers CloudFront, a managed content delivery network.
Azure
Azure's core Storage offering includes an object (blob) store which is very similar to S3. It offers two classes of storage, Hot and Cool. The main difference between Hot and Cool is the price: Cool comes at a lower storage cost but additional read and write costs. Latency for both tiers is the same. Geo-redundancy comes by default with options to dial this back to local redundancy, or up to provide read-access geo-redundancy. Azure allow users to snapshot blobs which provides some level of versioning but unlike AWS is not automatic.
Azure File Storage provides fully managed file shares using the SMB 3.0 protocol. File shares can be mounted either from Azure Virtual Machines or from on-premise servers. Data can also be accessed through a REST API for integration with other cloud services.
Microsoft has also recently introduced Data Lake Store a fully manged HDFS store specifically designed to support high-frequency, low latency analytical workloads. Data Lake Store is targeted at organisations who need to store high-fidelity data and perform batch, real-time and analytics. Pricing is a smidgen under blob hot storage (per GB).
Azure allows organisations to automatically backup on-premise data to Azure through Azure Backup. This service works by having a back up agent running on-premise which is responsible for synchronising selected data to Azure.
For more comprehensive hybrid scenarios, StorSimple allows organisations to build hybrid SAN solutions. StorSimple comes with a dedicated device that includes a hybrid storage array with SSDs and HDDs, together with redundant controllers and automatic failover capabilities. The device is installed on-premise and can be configured to handle a range of hybrid scenarios including DR, data mobility and storage consolidation.
For data bulk import and export Azure has Import/Export which utilises secure physical hard drive shipping to ensure data is transfer quickly, securely and cost effectively.
Like AWS, Azure also comes with a fully manged Content Delivery Network.
Google Cloud Platform
Google provides comparable object storage services with Cloud Storage which offers Standard, Durable Reduced Availability and Nearline storage classes. Durable Reduced Availability is the cheapest option for data that isn't super-critical and Nearline offers cold storage with lower storage costs but additional access charge. Like AWS, Google supports automatic object versioning.
For file shares Google relies on Compute Engine instances configured as a single node file server or through GlusterFS, an open source distributed file system.
Google do not currently offer any dedicated hybrid storage solutions and physical bulk data import/export is only available through third parties. Google does however have a simple data transfer option for organisations migrating data fromAmazon S3.
Cloud CDN is Google's content delivery network for applications running on Compute Engine.
Conclusion
All providers have good object storage options and so storage alone is unlikely to be a deciding factor when choosing a cloud provider. The exception perhaps is for hybrid scenarios, in this case Azure and AWS clearly win. AWS and Google's support for automatic versioning is a great feature that is currently missing from Azure; however Microsoft's fully managed Data Lake Store offers an additional option that will appeal to organisations who are looking to run large scale analytical workloads. If you are prepared to wait 4 hours for your data and you have considerable amounts of the stuff then AWS Glacier storage might be a good option.
If you use the common programming patterns for atomic updates and consistency, such as etags and the if-match family of headers, then you should be aware that AWS does not support them, though Google and Azure do. Azure also supports blob leasing, which can be used to provide a distributed lock (as used in Endjin's leasing framework).
A WORD OF WARNING: storage has never been cheaper with headline storage costs looking ridiculously cheep, but pricing can quickly get complicated and unpredictable especially if you plan to move large volumes of data. For object storage all three providers charge per GB of data stored plus a request charge (depending on the nature of the request) plus an egress charge per GB. Storage and egress charges will vary depending on the amount of data involved. The exception is Google who offer a flat storage charge regardless of how much data you have. If you are versioning objects then you will pay for these as well. Google and AWS have minimum charge periods for their archive storage offerings, but be careful, these kick in when you move data as you are effectively deleting and re-creating objects. On top of all that, AWS will charge an additional premium if you retrieve more than 5% of your average Glacier storage. Confused? I don't blame you!
Next up we will be looking at how the three providers match up with their database offerings.