Search Analytics: Capturing the data

Sitecore Page Flow

This is the first in a series of posts that will take a look under the hood of how the search analytics module works.

In this post we will look at how the data is captured, either in XP-Mode using XConnect events to delay processing until after the session ends, or in XM-Mode where it is sent direct to SQL Server. In the next post we will look at how to create a module to surface the data and link to it from both the launchpad and individual items in the content editor.

DATA CAPTURE

To start monitoring the value of searches on a website, there are three types of events that need to be captured. These are:

  1. Individual Searches – which search terms are used and how often are they entered.
  2. Search Rankings – which are the top pages that rank for a given search term and in what order do they appear.
  3. Click Throughs – which pages are engaged with the most for a given search term.

Individual Searches and Search Ranking

These first two events are handled at the same point in time. After the user enters the search term and as the response for the search listings page is being generated.

You can find an example of a typical SearchController (here), that uses a SearchService (injected dependency) to query the SOLR index. To register the events with the Search Analytics module, we send the context item, search term and the top 20 ItemIDs to the ITrackSearch service (which is available via DI).

The ITrackSearch service then does one of two things depending on whether in XP-Mode or XM-Mode. A check is made to see if XM-Mode has been set to true OR if either Xdb.Enabled or Xdb.Tracking.Enabled is set to false. In this situation data is sent directly to SQL Server via the ISearchStore interface.

If the settings above are not detected, then the assumption is made that XP is enabled and XConnect is fully functioning. This allows us to register events in the Sitecore tracker and delay processing them until the session ends (using the convertToXConnectEvent pipeline).

The first thing that we register is a SearchRanking Page Event (one of the custom events that are installed as part of the SearchAnalytics package). This event has no engagement value and is used only as a vessel to hold the ranking data until the end of session.

After this we then register a Search Page Event, which comes out of the box with Sitecore (/sitecore/system/Settings/Analytics/Page Events/Search). By triggering this event, we unlock Sitecore’s own tracking and monitoring in Experience Analytics (see https://doc.sitecore.com/xp/en/users/103/sitecore-experience-platform/the-dimensions-and-metrics-in-experience-analytics-reports.html)

Internal search keywords are the words or phrases that a contact enters into the Search field on your website. This dimension is not case sensitive, so the interactions searching for "sitecore" and "Sitecore" appear in reports as "sitecore". The internal search keyword is triggered by the Search page event. Use the Internal search keyword dimension to analyze the actions and behavior of contacts that use different search keywords on your website.

SITECORE

Click Throughs

Finally, we need to link page visits and the associated engagement data, back to the search term that led the user to visit the page. To do this, we hook into the processItem pipeline, which runs just before the page being visited is served back to the user. At this point in time, we have access to the HttpContext and with it the UrlReferrer. We check this to see if it contains a string that uniquely identifies a search listing page (this is defined as a setting called searchPageUrlPartial). Every time this is identified, we register a ClickThrough Page Event on the page currently being processed. When we come to process the event after the session ends, we then have access engagement data via the parent event.

Processing the events

The benefit of storing the search data in page events, is that we can defer processing the data and making calls to SQL server until the user’s session has ended. By doing this we should hopefully avoid unduly adding latency during a visit.

When running in XP mode, Sitecore runs the convertToXConnectEvent pipeline on session end. By creating a custom processor and hooking into this pipeline, we can evaluate the data being processed and add our own custom code.

This code checks every page visit from a user’s tracker and looks for any of the events that we have registered. For each event that is detected, it uses the ISearchStore service to send data to the relevant table in SQL (via a stored procedure).

  • If a Search event is detected, the data is stored in the SingleSearches table and optionally add the users contactId if available.
  • If a SearchRanking event is detected, then a check is made to see if data has been recorded for that search term on the current day. If not then the string of ItemIds is split into an array and a call is made to add each as a new record in the SearchRankings table.
  • If a ClickThrough event is detected, a record is added to the ClickThroughs table with ItemId, SearchTerm, Date and also the length of the page visit (taken from the parent pageViewEvent).

Summary

In this post we looked at how the data for the Search Analytics module is captured. In the next post, we will look at how we surface that data into a useful format. Also how we go about creating a search report and allow users to access it from both the Launchpad and also an individual Item in the content editor.

Leave a Reply

Your email address will not be published. Required fields are marked *