Click here to learn more about Gilad David Maayan.
Microsoft Azure is a secure, scalable, durable and highly available cloud storage service. There are two storage account types, five storage types, four data redundancy levels, and three storage tiers. This article focuses on Azure’s Blob Storage service, including Blob types, Blob tiers, and best practices for managing Blob cost and availability.
What is Azure Blob Storage?
Azure Blob Storage is a cloud storage service for Binary Large Objects (BLOB). Blob Storage enables you to store large amounts of unstructured data. Azure’s blob storage service includes the following components:
- Blob: A file of any type and size.
- Container: A group of blobs. There is no limit to the number of blobs in a container. The name of a container must always be lowercase.
- Storage Account: Azure offers three storage account types – General Purpose v1 (GPv1), General Purpose v2 (GPv2), and a dedicated blob storage account.
GPv2 account offers multiple storage options like Azure file storage, queues, tables, and disks. Dedicated blob storage does not offer replication options and performance tiers. GPv1 is a storage account for legacy files, tables, and queues.
Blobs can be further categorized into three types.
Block blobs can store binary media files, documents, or text. A single block blob can store up to 50,000 blobs of 100 MB each. The total block size can reach more than 4.75 TB.
Append blobs are optimized for appending operations like logging scenarios. The difference between append blobs and block blobs is the storage capacity. Append blob can store only up to 4MB of data. As a result, append blocks are limited to a total size of 195 GB.
Page blobs have a storage capacity of about 8 TB, which makes them useful for high reading and writing scenarios. There are two different page blob categories, Premium and Standard. Standard blobs are used for average Virtual Machines (VMs) read/write operations. Premium is used for intensive VM operations. Page blobs are useful for all Azure VM storage disks including the operating system disk.
Azure Blob Storage Tiers
The blob storage option is not persistent, as opposed to other Azure storage options like hard disks of Infrastructure-as-a-Service (IAAS) or VMs. As a result, you have to use persistent stores like tiers for long-term storage of files. There are three types of storage tiers.
Cool tier storage is used for short-term storage of backup data. In addition, cool tiers offer much lower storage costs since you do not access the stored data on a regular basis.
The hot tier is optimal for frequent reading and writing access to stored data. The hot storage tier has the highest storage cost and the lowest access cost out of the three storage tiers.
The archive tier is used for long-term backup and raw data storage. The archive tier stores data offline and you cannot access the data immediately. You need several hours to retrieve the data. As a result, the archive tier has higher retrieval costs and lower storage costs compared to the other two tiers.
6 Best Practices for Blob Storage
Azure charges users for three components – storage space, traffic, and operations on stored data. Each component has a different influence on the costs and availability of data. The list below reviews the essential best practices for controlling and maintaining Blob storage costs and availability.
- Define the Type of Content
When you upload files to blob storage, usually all files are stored as an application/octet-stream by default. The problem is that most browsers start to download this type of file instead of showing it. This is why you have to change the file type when uploading videos or images. To change the file type, you have to parse each file and update the properties of that file.
- Define the Cache-Control Header
The HTTP cache-control header allows you to improve availability. In addition, the header decreases the number of transactions made in each storage control. For example, a cache-control header in a static website hosted on Azure blob storage can decrease the server traffic loads by placing the cache on the client-side
- Parallel Uploads and Downloads
Uploading large volumes of data to blob storage is time-consuming and affects the performance of an application. Parallel uploads can improve the upload speed in both Block blobs and Page blobs. For example, an upload of 70GB can take approximately 1,700 hours. However, a parallel upload can reduce the time to just 8 hours.
- Choose the Right Blob Type
Each blob type has its own characteristics. You have to choose the most suitable type for your needs. Block blobs are suitable for streaming content. You can easily render the blocks for streaming solutions. Make sure to use parallel uploads for large blocks. Page blobs enable you to read and write to a particular blob part. As a result, all other files are not affected.
- Improve Availability and Caching with Snapshots
Blob snapshots increase the availability of Azure storage by caching the data. Snapshots allow you to have a backup copy of the blob without paying extra. You can increase the availability of the entire system by creating several snapshots of the same blob and serving them to customers. Assign snapshots as the default blob for reading operations and leave the original blob for writing.
- Enable a Content Delivery Network (CDN).
A content delivery network is a network of servers that can improve availability and reduce latency by caching content on servers that are close to end-users. When using CDNs for Blob storage, the network places a blob duplicate closer to the client. Accordingly, each client is redirected to the closest CDN node of blobs.
The best practices in this article should allow you to efficiently manage blobs, lower storage costs, and deliver high availability through an auto-scaling environment. Blob storage management allows you to focus on your applications rather than infrastructure. Hopefully, after applying these practices, you’ll be able to use blobs in an efficient and cost-effective manner.