We recently undertook an upgrade of our Sitecore installation from version 9.3 to 10.1.2. This blog post will look at some of the issues that we faced when migrating our XConnect data between old and new instances.
With the release of version 10.1, Sitecore stopped supporting the use of MongoDB for the XConnect collection data. With that in mind, it was necessary for us to migrate our data from the old XConnect 9.3 (running on MongoDB) to our new XConnect installation (running on SQL Server).
Luckily Sitecore provide a tool to help with the task on their github pages https://github.com/Sitecore/XConnect-Code-Samples. This console application uses the XConnect Client API to read contacts (together with related facets, identifiers, and interactions) from the source and submit them to the target instance.
Before you can start migrating data, you need to first double check the following on the source XConnect instance:
- Verify that XConnect is not collecting any new data.
- Verify that the Marketing Automation pool is empty and there is no ongoing task processing.
- Verify that the Processing Pool is empty and there are no active or different items processing.
- Verify that the Tracker Submit Queue is empty and there is no ongoing data submission.
When you are ready to begin the migration, update the connection strings in the config file, then build and run Sitecore.XConnect.DataMigration.Tool.exe. Simple right?
Our Experience
When running the migration tool, we experienced a number of issues.
Issue One: First of all, the application would often fail to connect to the source XConnect instance. Then after attempting launching the application multiple times, it would eventually start after the 5th or 6th attempt... Once a connection was made, it stayed active (unless it hit an issue...)
The application logs seemed to only show the top level exception:
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Sitecore.XConnect.DataMigration.Source.XConnectReader.Execute() in C:\TFS\XConnect-Code-Samples\XConnect-Code-Samples-main\code\Sitecore.XConnect.DataMigration.Source\XConnectReader.cs:line 53
But looking closely on the console output, we could see that xDB wasnt available for some reason:
Sitecore.XConnect.XdbCollectionUnavailableException: An error occurred while sending the request. ---> Sitecore.Xdb.Common.Web.ConnectionTimeoutException: A task was canceled
Issue Two: Once the application started successfully, it would run for 24-26hrs and then fail with the following exception:
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Sitecore.XConnect.DataMigration.Source.XConnectReader.Execute(Byte[] bookmark) in C:\TFS\XConnect-Code-Samples\XConnect-Code-Samples-main\code\Sitecore.XConnect.DataMigration.Source\XConnectReader.cs:line 75
---- removed for berevity ----
Sitecore.XConnect.XdbCollectionUnavailableException: An error occurred while sending the request. ---> Sitecore.Xdb.Common.Web.ConnectionTimeoutException: A task was canceled
---- removed for berevity ----
Sitecore.XConnect.DataMigration.Source.ContactEnumeratorReader.<ReadBatch>d__6.MoveNext() in C:\TFS\XConnect-Code-Samples\XConnect-Code-Samples-main\code\Sitecore.XConnect.DataMigration.Source\ContactEnumeratorReader.cs:line 39
As you can see the error is similar to the first, but originates at a point when the contact enumerator is cycling through contacts.
Trouble shooting
To ensure connectivity to XConnect, we made sure collection and search were correctly showing a timestamp in the browser for all instances. We also setup a scheduled task to ping the front ends every 5 minutes, to ensure they were always warmed up and ready to recieve requests. We also confirmed no exceptions in any of the logs.
Double check the prerequisites
The first thing we found was that we had not followed the prerequestites exactly. We made the mistake of reading the first prerequisite "Verify that XConnect is not collecting any new data" and assumed (incorrectly), that because we had switched off the CM instance, it wouldnt be collecting (or processing data). Hence we didnt need to worry about the other prerequisites.
At a point during the upgrade process, we switched off the old instances and began using the new. The problem was though that the varous queues were still populated (GenericProcessingPool - 2 million records), thus once the old XConnect instance was again switched on for the migration. These pools were being processed.
To confirm this check these queues and clear if necessary:
- [Sitecore_MarketingAutomation].xdb_ma_pool.AutomationPool
- [Sitecore_Processing.Pools].xdb_processing_pools.InteractionLiveProcessingPool
- [Sitecore_Processing.Pools].xdb_processing_pools.GenericProcessingPool
XConnect Timeouts
The next thing we looked at was increasing the timeout settings on the XConnect. The default timout when using the api is set to 100 seconds (as you can see in this article Configuring xConnect Client API timeouts). You can increase this setting by introducing a TimeoutHttpClientModifier and adding when initialising the connection:
List<IHttpClientModifier> clientModifiers = new List<IHttpClientModifier>();
var timeoutClientModifier = new TimeoutHttpClientModifier(new TimeSpan(6, 0, 0));
clientModifiers.Add(timeoutClientModifier);
var xConnectConfigurationClient = new ConfigurationWebApiClient(new Uri(uri + "configuration"), clientModifiers, handlerModifiers);
var xConnectCollectionClient = new CollectionWebApiClient(new Uri(uri + "odata"), clientModifiers, handlerModifiers);
var xConnectSearchClient = new SearchWebApiClient(new Uri(uri + "odata"), clientModifiers, handlerModifiers);
var xConnectClientConfig = new XConnectClientConfiguration(_xDbModel, xConnectCollectionClient, xConnectSearchClient, xConnectConfigurationClient);
Unfortunately, setting it to 6 hours (as seen in the example above) still didnt allow the program to fully suceed.
Contacts with a large volume of interactions
The next thing we looked at was to see if any contacts had a large volume of interactions. This I believe was the main culprit for why the migration was failing to complete. When we ran our queries in Mongo, we could see that 50 contacts had over 1000 interactions. With one specific contact having in the region of 150,000... which clearly must be some kind of web crawler.
Having identified these contacts, we then created a simple console application to pull up each individual contact and review some key information; then delete the contact if it was deemed to be an unauthentic user. You can find more information about the process to do this here: Identify and remove contacts with large volume of interactions. Also, see console app for deleting the contacts here Delete contacts from XConnect.
Some minor adjustments
For good measure, we added a counter to both the read and write process, so that we could see exactly where the process was getting to (when reading our 8 million contacts....). We also added a try/catch to the sourceReader call in the MigrationRunner.cs
readResult = sourceReader.Execute(bookmark);
This, together with the changes outlined above, then led to our first successful completion of the migration process (in a little under 3 days).
Summary
If you are experiencing similar issues, trying to get the migration tool to complete, then I recommend first looking for contacts with many interactions. It seems obvious now, that pulling down a contact with such a high volume of meta data would take a long time... and thus exceed the default timeouts.