So its the weekend and i’m sat at in a soft play. We are here for my daughters’ friends birthday party. I get a call, that as a developer responsible for a large high profile website, you dont ever want to recieve… It is a call from the help desk, who inform me that their monitoring systems are going crazy. Multiple key pages are missing from the website, with more and more disappearing as time goes on.
Luckily, its time to leave, so we say our good byes and head home to see what can be done.
When I get online and access the sitecore back end, my heart sinks as I see this picture. On one of our publishing targets, all the content is missing from beneath the content node.
What could it be? some kind of cyber incident? a deliberate malicious deletion?
When looking at the second publishing target, we could see that it wasnt quite as bad. On that instance only partial chunks of content was mising. We made the assumption it was a publishing issue and recycled the publishing instance. With publishing stopped, we then pointed all CDs at the more populated target.
Interestingly enough though, the master DB looked fully intact. Analysis of the master database items table, showed a similar number of items, to that of our pre upgrade instance from a few week prior.
This led us to our first temporary solution. Given the pre-upgrade installation was still available and only a few weeks out of date. We switched over all our IIS bindings to the old site and were quickly able to have a full site displaying again.
So what went wrong….
If there was ever a time to raise a sitecore ticket, then this was it. Our analysis of our logs showed lots of publish related errors. Also many exceptions relating to ItemDeleted events:
Invalid event arg type: Sitecore.Data.Events.ItemDeletedRemoteEventArgs. Expected: SitecoreEventArgs"
We could also see that there was only one significant publish event earlier that day. This was a publish to a fairly deep item (4 levels deep) with related items set to true. Our question to sitecore was: How could that publish event cause all of our content be removed from our web database, yet leave the master untouched?
They were able to reproduce the issue when there was a difference between “/sitecore” root item in master and web databases.
From Sitecore 10.1 version the default Sitecore items are stored in the resource files (.dat). As part of the upgrade process, you need to remove all default sitecore items using the Sitecore.Update tool.
This is what appears to be the case. We removed all the items from the master database, but we didnt fully remove all items from each of our publishing targets.
So when the publish happened with the “related item checkbox”. During the publishing operation, there was a try to publish the “/sitecore” item. Sitecore checked if this item is the same in both databases. Since they are not the same Sitecore tries to delete this item from the web database and it is expected behavior. The delete process started from its child items.
Because the “/sitecore/content” item is a resource item, the deletion process was aborted, and the “/sitecore” item remains unchanged. Therefore you can see the empty content tree under the “/sitecore/content” item.Yurii Pysanko
Conclusions and lessons learnt
Our (incorrect) assumption was this. The publish operation, when making the comparison bewteen master and web would use the combined result of the default .dat + the sql data. We also assumed that no combination of events could lead to such a mass deletion of all content.
That said however, I think it is safe to say the reason this happened was because we failed to follow the exact steps shown in the upgrade giude. Paragraph 3.2.3 (For Sitecore 10.1) described how to remove default Sitecore items from your databases or detect if they have some modification and you should remove them manually. You can find the upgrade guide on the following page: https://dev.sitecore.net/Downloads/Sitecore_Experience_Platform/101/Sitecore_Experience_Platform_101.aspx