We recently upgraded our sitecore instance from 9.3.0 to 10.1.2. With this came some major changes. One of which was a move from Mongo to SQL to host our Collection databases.
Since the upgrade, our xconnect collection instance never seemed to be in a good place. Data was flowing as expected and analytics was being shown in the backend of sitecore. However, we were finding a large number of exceptions in the logs similar to below:
Sitecore.XConnect.Operations.EntityOperationException: Operation #0, AlreadyExists, Contact. XdbContext Batch Execution Exception
Sitecore.XConnect.Operations.EntityOperationException: Operation #1, ReferenceNotFound, DeviceProfile {7bf8e76e-38ad-49d5-878a-77f442cc4024}
We double checked and then triple checked all our configuration. Reviewed the upgrade guides to make sure we hadnt over looked something. Published all marketing definitions, rebuild the reporting database, rebuilt anaalytics index... literally everything we could think of to get a clean running instance. But unfortunately nothing seemed to do the trick.
Sitecore support
It was time to reach out to the guys at sitecore support again.
This ticket with the support team went on for a very long time! As per the usual process we sent over every scrap of data that we could lay our hands on. We went through several rounds of dicussions with the support team. Analysing custom code, session setup and tweaked configuration, but the problem kept coming back.
To aid the investigation, we added a requests database and custom binaries to give detailed tracking information. This was done both on the CD servers and the XConnect instances, but alas the root cause was unknown.
Finally, after months of back and forth. Yevhen from the support team provided some information which seemed to provide an answer.
I assume that it can be related to issue 453184 “Contact identifier index was not deleted during failed contact saving operation and Tracker fails due to AlreadyExist exception”. Such a situation can occur if one of the Shard databases is unavailable. I checked your shard databases and find a lot of identifiers in the “ContactIdentifiersIndex” tables that don’t have appropriate identifiers in the “ContactIdentifiers” tables.
Yevhen Povzlo
The solution
It appears that the issue occurs when one of our shard databases becomes unavailable. In such a situation, a contact identifier is unable to be deleted from that database. Over time the number of these identifiers increases. Apparently this build up can cause the errors shown.
The advice provided was to run SQL to find any identifiers in the “ContactIdentifiersIndex” tables that don’t have appropriate identifiers in the “ContactIdentifiers” tables:
To find the entries run this SQL:
(
select ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiersIndex]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiersIndex]
)
except
(
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiers]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiers]
)
If the above query is successful in finding identifiers, then you can use the following SQL to remove them.
delete a from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiersIndex] a
join
(
(
select ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiersIndex]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiersIndex]
)
except
(
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiers]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiers]
)
) b
on a.ContactId = b.ContactId and a.Source=b.Source
delete a from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiersIndex] a
join
(
(
select ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiersIndex]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiersIndex]
)
except
(
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard0].[xdb_collection].[ContactIdentifiers]
UNION
SELECT ContactId, Source from [Sitecore_Xdb.Collection.Shard1].[xdb_collection].[ContactIdentifiers]
)
) b
on a.ContactId = b.ContactId and a.Source=b.Source
Running the SQL script above, indeed stopped the errors from littering our logs. However, as time went on, the issue continued to happen (im guessing as the shard periodically was unavailable).
Further advice from Sitecore on this issue was to review the health (CPU / Memory) of our SQL servers. Then to review index fragmentation percentages on the shard databases, as heavily fragmented indexes can affect database performance
Example SQL to check the fragmentaion is as follows:
SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
AND indexstats.index_id = dbindexes.index_id
WHERE indexstats.database_id = DB_ID()
ORDER BY indexstats.avg_fragmentation_in_percent desc
If the above returns a fragmentation percentage of more than 10%, then you need to rebuild the fragmented indexed.
Further resources on this subject can be found here:
https://sqlshack.com/how-to-identify-and-resolve-sql-server-index-fragmentation
It was an interesting case! 🙂
Hope you are doing fine Dean and don't see any already exists issues! 🐱🏍
Hi Dean,
Wow, I have the same issue and resolved it the same way => but cannot be sure it doesn't happen in the future and how to prevent it from happening again.