Remove orphaned blobs from sitecore database

databases

This post covers one of the steps we took to reduce the size of our master database.

With our media library tidyied up a ittle bit, it was now time to tackle the massive size of our master database. If ever your databse is over sized, then a good place to start looking is the blobs table. Each single record in this table can be many MB in size, so a small problem here can quickly add up to a much bigger problem.

It was easy to see that something was not quite correct, because the number of blobs in our Blobs table far exceeded the number of media fields that had a value. To find these two bits of information, you can perform the following queries.

Number of distinct blobs

SELECT COUNT (DISTINCT BlobId) as 'Blobs', COUNT(BlobId) as 'Records' FROM [dbo].[Blobs]

Number of media fields with value

SELECT COUNT ( [Id]) AS Shared
FROM [93_Corp_Sitecore_master_New].[dbo].[SharedFields]
WHERE (FieldId='40E50ED9-BA07-4702-992E-A912738D32DC' OR FieldId='DBBE7D99-1388-4357-BB34-AD71EDF18ED3') and [Value] like '%{%}'

So in our case, we had approximately 750,000 blobs, accessed by only 55,000 media fields.

The obvious answer

You might say - so why didnt you just run the cleanup database task, which in turn performs a clean up blobs operation? Well, we did just that and each time the dialog prompt returned with this response after some time:

Job started: CleanUpDatabases|Job ended: CleanUpDatabases (units processed: )

Its easy to think that that task has run successfully, but you will notice that it in fact processed no units. A bit of further digging provided more information:

Exception: System.Data.SqlClient.SqlException
Message: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Source: .Net SqlClient Data Provider

Sitecore Support

Its at this point we reached out to sitecore support for some help. There initial suggestion was to increase the DefaultSQLTimeout (which defaults to 30 secs) to 5 mins. We tried this and then also tried 5 hrs, but still no success. Again it doesnt run to completion.

The next suggested course of action was to alter the CleanupBlobsBatchSize (default 1000). Which we tried at 750 and also 500, but again no success.

Interestingly, at no point during this whole process has a single blob been removed...

The solution

After all the above failed, the support guys were able to provided us with two SQL scripts that allowed us to select all the relevant blobs from the database and then start removing them in batches.

The first scripts selects the blobs that dont belong (took 5 hours for us). The second script deletes those selected blobs in batches (this took over 50 hours). So no wonder the default 30 secs SQLTimeout was not enough.

After completing these steps (and shrinking the database), the size of our database dropped from over 1TB to close to 200G. This was a great improvement, but obviously no where near small enough for us to create a snapshot SQL Image for use with docker and kubernetes.

The next post in this series will look at how we moved the remaining blobs from SQL Server to blob storage. This was done using Sitecore OTB blob storage module, but was not as sraight forward as we hoped.

First SQL Script - SELECT_Orphaned_Blobs.sql
This essentially does a trial run of the task - showing you exactly what will be deleted

WITH ExistingBlobs (BlobId) AS (  
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN SharedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = SharedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN VersionedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = VersionedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN UnversionedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = UnversionedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN ArchivedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = ArchivedFields.Value
)

SELECT COUNT(1) FROM Blobs
	LEFT JOIN ExistingBlobs ON Blobs.BlobId = ExistingBlobs.BlobId
WHERE ExistingBlobs.BlobId IS NULL

Second SQL Script - DELETE_Orphaned_Blobs_In_Batches.sql
This cycles through the selected blobs and deletes in batches

IF OBJECT_ID('tempdb..#ExistingBlobs') IS NOT NULL
BEGIN
    DROP TABLE #ExistingBlobs
END

CREATE TABLE #ExistingBlobs (BlobId uniqueidentifier NOT NULL)

INSERT INTO #ExistingBlobs (BlobId)
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN SharedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = SharedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN VersionedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = VersionedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN UnversionedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = UnversionedFields.Value

	UNION
	SELECT Blobs.BlobId FROM Blobs
		INNER JOIN ArchivedFields
			ON '{' + CONVERT(NVARCHAR(MAX), Blobs.BlobId) + '}' = ArchivedFields.Value

DECLARE @t INT;
DECLARE @batchsize INT;

SET @batchsize = 50;
SET @t = @batchsize;

WHILE @t > 0
BEGIN
BEGIN TRANSACTION;

	DELETE TOP (@batchsize) Blobs FROM Blobs
		LEFT JOIN #ExistingBlobs ON Blobs.BlobId = #ExistingBlobs.BlobId
	WHERE #ExistingBlobs.BlobId IS NULL

	SET @t = @@ROWCOUNT;

COMMIT TRANSACTION;
END

IF OBJECT_ID('tempdb..#ExistingBlobs') IS NOT NULL
BEGIN
    DROP TABLE #ExistingBlobs
END

Credit for this goes to Andrily Shcherbak at Sitecore Support

A word of caution

Mark Lowe has suggested in the comments, that the script above could delete blobs that are still currently in use.

Caution: This script doesn’t fit all versions. I tried this on a 9.3 database and it was deleting blobs which were still in use.

MARK LOWE

This wasnt the case, for me, but I would certainly recommend that anyone considering using the script, first of all tries this on a test environment. Also, that they first run the SELECT_Orphaned_Blobs.sql script, then join the resulting blob IDs back to the Fields view to see if there are any matches.

If there are any matches, it could be an issue with casing, as I remember having an issue with some blob IDs being all uppercase and some being all lowercase. In this situation, it should be simple enough to update values to one or the other.

4 thoughts on “Remove orphaned blobs from sitecore database

  1. Caution: This script doesn't fit all versions. I tried this on a 9.3 database and it was deleting blobs which were still in use. When looking at the SQL script run by the CleanupBlobs Job I found that there are more checks inside the ExistingBlobs query. I'd suggest trying to increase the timeout and using the

  2. I think its wiser to find all fields of type 'attachment', cast their values to a guid and filter the Blobs table on those. In that case, you don't need to worry about the varchar representation of the guid in the Value column, which indeed differs in casing and whether it includes { and }.

Leave a Reply

Your email address will not be published. Required fields are marked *