Moving media from SQL to Blob Storage

databases

Introduction

First of all, a little bit about how sitecore stores media. A media item in sitecore (images / videos / docs etc) has either a Versioned or an Unversioned File base type. Each of these base templates contains a field called Blob FieldIds 40E50ED9-BA07-4702-992E-A912738D32DC and DBBE7D99-1388-4357-BB34-AD71EDF18ED3 respectively).

When you attach a file to a media item (JPG for example) , that file is converted to BLOB (Binary Large Object) format – which is effectively a very long string representing the data for the file. That BLOB is stored in the SQL Blob table and given an ID (i.e. 12345678-1234-1234-123456789012). That ID is then stored as the Value in the relevant fields table, together with the FieldId (to identify as blob field) and ItemId (to link back to item).

What happens when using blob storage?

When the blob storage module is installed and configured, each unique blob is moved to a blob storage container (in Azure for example). Everything stays the same in Sitecore, except that the Value in the relevant field table is prefixed with the value “blob://”.

So in our example above, the new value would be blob://12345678-1234-1234-123456789012. Both methods of storing blobs can run in a sitecore instance concurrently. When a request is made to get the blob data – it check for the blob prefix. If present the request is directed to storage – if not it knows to get the data from the usual SQL Blob table.

Whats the benefits of using blob storage?

Im sure there are more, but these are the ones that jumped out to me:

  1. Reduced database size – leader to faster backups and more portable databases
  2. Reduced cost of storage
  3. No duplication of storage – in usual setup, when item is published the blob is copied to web DB and all publishing targets (i.e. possible 4 copies). With blob storage, only the value is copied between DBs, each instance access the single blob from storage.

Migrating the blobs

There are a number of blogs out there talling through the process, so I wont go into too much detail. But in short, you need to:

  1. Create a blob storage account
  2. Download the module and add the files to your solution
  3. Update the connection string
  4. Update config to identify your new container name.
  5. Run the migration script

Full details here: https://doc.sitecore.com/xp/en/developers/93/sitecore-experience-manager/enable-the-azure-blob-storage-module-for-sitecore-blobs.html

Problem: It takes a very long time

If you have a very large number of blobs to migrate (as we did) then get ready for the migration script to take a very long time… im talking days or weeks to complete! Thats why its very important to remove any orphaned blobs first.

I believe the main reason that the process takes so long is because we hit some kind of HTTP request thottling on our Azure VMs. Reason being, the script would execute and progress would be several thousands per hour. Then all of a sudden this drops down to the hundreds per hour…

Handy Tip: run the script on two separate instances, but from different directions

One handy tip I would recommend would be to run the script on two separate instances, but from different directions. But on one of them alter this line in the migration script, to ORDER BY DESC:

SELECT DISTINCT [BlobId] FROM [Blobs] ORDER BY [BlobId] ASC OFFSET $(($PageNumber-1)*$PageSize) ROWS FETCH NEXT $PageSize ROWS ONLY

That way both start from opposite ends (at fast rate to begin – so you benefit from x2 the unthrottled allowance). Once the count gets close to meeting in the middle, kill the later process. Then, when the first process reaches the files that have already been uploaded, it finds that the item exists in blob storage and quickly moves on, without having to upload.

Problem: Loss of data

Another issue we found was that after having uploaded all of our blobs to storage. We would then run the OTB ‘Clean up databases’ task from within the sitecore control panel. However, this led to it unexpectedly deleting (in a heart beat – after taking so long to upload…) a big chunk of our blobs!

Handy tip: once you have completed the migration – clone your blob storage container – before doing any clean up tasks

It turns out the reason for our blobs being deleted, was due to the format of the original Values in the field table. Somehow (still not entirely sure how or why), we had a range of formats for the blob id in the fields table. With some being upper case, some lower, some with brackets before and after and some without… Examples:

  • 3CE858B3-400D-4530-BDF2-CC13509B5607
  • 3ce858b3-400d-4530-bdf2-cc13509b5607
  • {3CE858B3-400D-4530-BDF2-CC13509B5607}
  • {3ce858b3-400d-4530-bdf2-cc13509b5607}

We can only assume this is a relic from a previous upgrade. Either way, the migration script standardizes this by removing the brackets and storing the value as the ID in blob storage. When the cleanup task ran, it was not matching the incorrect case to the value in the fields table and hence deleting.

The solution in this situation was to run a script to convert all the Media IDs to lowercase, which then matched those values in blob storage.

Summary

The main issue we found when adopting the blob storage module, was the time it took to run our tests, and then the time it took to learn from our mistakes… all in all though it was definitely worth the effort. We have went from having a database that was over 1 TB in size, to one that after backed up is less than 2 G in size.

This means we can quickly and easily pull down a copy of the latest production data to a developer machine, send a copy to sitecore support if required. It has reduced our storage costs and makes the backup and restore process a great deal simpler.

Perhaps the greatest benefit though is the ability to be able to snapshot the DBs on a regular basis and save to an image, which when used together with our solution images can bring up a recent containerised version of our corporate site in minutes. Which forms part of our contingency planning.

Leave a Reply

Your email address will not be published. Required fields are marked *