This post covers one of the steps we took to reduce the size of our master database.
With our media library tidyied up a ittle bit, it was now time to tackle the massive size of our master database. If ever your databse is over sized, then a good place to start looking is the blobs table. Each single record in this table can be many MB in size, so a small problem here can quickly add up to a much bigger problem.
It was easy to see that something was not quite correct, because the number of blobs in our Blobs table far exceeded the number of media fields that had a value. To find these two bits of information, you can perform the following queries.
Number of distinct blobs
SELECT COUNT (DISTINCT BlobId) as 'Blobs', COUNT(BlobId) as 'Records' FROM [dbo].[Blobs]
Number of media fields with value
SELECT COUNT ( [Id]) AS Shared
FROM [93_Corp_Sitecore_master_New].[dbo].[SharedFields]
WHERE (FieldId='40E50ED9-BA07-4702-992E-A912738D32DC' OR FieldId='DBBE7D99-1388-4357-BB34-AD71EDF18ED3') and [Value] like '%{%}'
So in our case, we had approximately 750,000 blobs, accessed by only 55,000 media fields.
The obvious answer
You might say – so why didnt you just run the cleanup database task, which in turn performs a clean up blobs operation? Well, we did just that and each time the dialog prompt returned with this response after some time:
Job started: CleanUpDatabases|Job ended: CleanUpDatabases (units processed: )
Its easy to think that that task has run successfully, but you will notice that it in fact processed no units. A bit of further digging provided more information:
Exception: System.Data.SqlClient.SqlException
Message: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Source: .Net SqlClient Data Provider
Sitecore Support
Its at this point we reached out to sitecore support for some help. There initial suggestion was to increase the DefaultSQLTimeout (which defaults to 30 secs) to 5 mins. We tried this and then also tried 5 hrs, but still no success. Again it doesnt run to completion.
The next suggested course of action was to alter the CleanupBlobsBatchSize (default 1000). Which we tried at 750 and also 500, but again no success.
Interestingly, at no point during this whole process has a single blob been removed…
The solution
After all the above failed, the support guys were able to provided us with two SQL scripts that allowed us to select all the relevant blobs from the database and then start removing them in batches.
The first scripts selects the blobs that dont belong (took 5 hours for us). The second script deletes those selected blobs in batches (this took over 50 hours). So no wonder the default 30 secs SQLTimeout was not enough.
After completing these steps (and shrinking the database), the size of our database dropped from over 1TB to close to 200G. This was a great improvement, but obviously no where near small enough for us to create a snapshot SQL Image for use with docker and kubernetes.
The next post in this series will look at how we moved the remaining blobs from SQL Server to blob storage. This was done using Sitecore OTB blob storage module, but was not as sraight forward as we hoped.
First SQL Script – SELECT_Orphaned_Blobs.sql
This essentially does a trial run of the task – showing you exactly what will be deleted
WITH ExistingBlobs (BlobId) AS (
SELECT Blobs.BlobId FROM Blobs
INNER JOIN SharedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = SharedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN VersionedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = VersionedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN UnversionedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = UnversionedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN ArchivedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = ArchivedFields.Value
)
SELECT COUNT(1) FROM Blobs
LEFT JOIN ExistingBlobs ON Blobs.BlobId = ExistingBlobs.BlobId
WHERE ExistingBlobs.BlobId IS NULL
Second SQL Script – DELETE_Orphaned_Blobs_In_Batches.sql
This cycles through the selected blobs and deletes in batches
IF OBJECT_ID('tempdb..#ExistingBlobs') IS NOT NULL
BEGIN
DROP TABLE #ExistingBlobs
END
CREATE TABLE #ExistingBlobs (BlobId uniqueidentifier NOT NULL)
INSERT INTO #ExistingBlobs (BlobId)
SELECT Blobs.BlobId FROM Blobs
INNER JOIN SharedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = SharedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN VersionedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = VersionedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN UnversionedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = UnversionedFields.Value
UNION
SELECT Blobs.BlobId FROM Blobs
INNER JOIN ArchivedFields
ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = ArchivedFields.Value
DECLARE @t INT;
DECLARE @batchsize INT;
SET @batchsize = 50;
SET @t = @batchsize;
WHILE @t > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (@batchsize) Blobs FROM Blobs
LEFT JOIN #ExistingBlobs ON Blobs.BlobId = #ExistingBlobs.BlobId
WHERE #ExistingBlobs.BlobId IS NULL
SET @t = @@ROWCOUNT;
COMMIT TRANSACTION;
END
IF OBJECT_ID('tempdb..#ExistingBlobs') IS NOT NULL
BEGIN
DROP TABLE #ExistingBlobs
END
Credit for this goes to Andrily Shcherbak at Sitecore Support
A word of caution
Mark Lowe has suggested in the comments, that the script above could delete blobs that are still currently in use.
Caution: This script doesn’t fit all versions. I tried this on a 9.3 database and it was deleting blobs which were still in use.
MARK LOWE
This wasnt the case, for me, but I would certainly recommend that anyone considering using the script, first of all tries this on a test environment. Also, that they first run the SELECT_Orphaned_Blobs.sql script, then join the resulting blob IDs back to the Fields view to see if there are any matches.
If there are any matches, it could be an issue with casing, as I remember having an issue with some blob IDs being all uppercase and some being all lowercase. In this situation, it should be simple enough to update values to one or the other.
Caution: This script doesn’t fit all versions. I tried this on a 9.3 database and it was deleting blobs which were still in use. When looking at the SQL script run by the CleanupBlobs Job I found that there are more checks inside the ExistingBlobs query. I’d suggest trying to increase the timeout and using the
…and using the DBCleanup.aspx admin page.
I have been dealing with my own issues around the Blob table growing out of control and have my experience and elaboration on the issues Mark refers to. If you are unable to get the Sitecore CleanupBlobs feature to work for you, take a read of my full writeup which builds on a number of articles including this one. https://benrichardson.uk/2023/10/06/we-need-to-talk-about-sitecore-blobs/
I think its wiser to find all fields of type ‘attachment’, cast their values to a guid and filter the Blobs table on those. In that case, you don’t need to worry about the varchar representation of the guid in the Value column, which indeed differs in casing and whether it includes { and }.