Sitecore Page Recommender: Building our Page Recommendation ML Model

page recommender

This blog post is part of a larger series that looks at the steps to create a
Page Recommendation Engine for Sitecore.


Introduction

In order to get the additional functionality described at the end of the last post (How to build a machine learning model). We need to manually piece together the individual methods that will be used to prepare data, train and evaluate the model. This is covered in a lot more detail in the following article:

Tutorial: Build a movie recommender – matrix factorization – ML.NET

To begin:

1. Create a new console app solution.


2. Select “.Net 6.0“


3. Add the following Nuget packages:

  • Microsoft.ML
  • Microsoft.ML.Recommender
  • Microsoft.Extensions.Logging

4. Create a new class called PageRatingData.cs and paste in the following:

using Microsoft.ML.Data;

namespace ml_recommender
{
    public class PageEngagement
    {
        public string ContactId { get; set; }
        public string PageId { get; set; }
        public float Engagement { get; set; }
        public static PageEngagement FromCsv(string csvLine)
        {
            string[] values = csvLine.Split(',');
            PageEngagement pageEngagement = new PageEngagement();
            pageEngagement.ContactId = values[0];
            pageEngagement.PageId = values[1];
            pageEngagement.Engagement = float.Parse(values[2]);
            return pageEngagement;
        }
    }
    public class PageRating
    {
        [LoadColumn(0)]
        public string contactId;
        [LoadColumn(1)]
        public string pageId;
        [LoadColumn(2)]
        public float Label;
    }
    public class PageRatingPrediction
    {
        public float Label;
        public float Score;
    }
}

The first class represents the data being passed to the service. Note: the ‘FromCsv’ method is only used by our console app to provide test data.

The second class PageRating specifies an input data class for the machine learning model. The LoadColumn attribute specifies which columns in the dataset should be loaded. The first columns (contactId and pageId) are called “Features”, which are effectively the inputs that are used to predict our result, convention dictates this should be called Label.

The third class represents the predicted result.

Create a class called PageRecommender.cs and paste in the following:

using Microsoft.Extensions.Logging;
using Microsoft.ML;
using Microsoft.ML.Trainers;

namespace ml_recommender
{
    public class PageRecommender
    {
        private string _modelPath = Path.Combine("c:/pagerecommender/", "Data", "PageRecommenderModel.zip");
        private ILogger _logger;
        public List<PageRating> _trainingData { get; set; }
        public List<PageRating> _testData { get; set; }
        public PageRecommender(ILogger logger) { _logger = logger; }
        
        // ADD Methods here
    }
}

6. Add the following Methods

The Train method is the entry point to the service.The cortex engine will collate the data and prepare it as a list of PageEngagement objects and pass it to service via this method.

public void Train(List<PageEngagement> data)
{
    MLContext mlContext = new MLContext();
    PrepareData(data);
    (IDataView trainingDataView, IDataView testDataView) = LoadData(mlContext);
    ITransformer model = BuildAndTrainModel(mlContext, trainingDataView);
    EvaluateModel(mlContext, testDataView, model);
    SaveModel(mlContext, trainingDataView.Schema, model);
}

The PrepareData Method prepares splits the list of PageEngagement objects into two lists. One for training and one for testing. Standard convention is to use a ratio of 80:20

private void PrepareData(List<PageEngagement> data)
{
    _trainingData = new List<PageRating>();
    _testData = new List<PageRating>();
    var size = data.Count();
    var trainSize = (int)Math.Ceiling(size * 0.8);
    var dataAsArray = data.ToArray();
    for (int i = 0; i < trainSize; i++)
    {
        _trainingData.Add(new PageRating() { contactId = dataAsArray[i].ContactId, pageId = dataAsArray[i].PageId, Label = dataAsArray[i].Engagement });
    }
    for (int i = trainSize; i < size - 1; i++)
    {
        _testData.Add(new PageRating() { contactId = dataAsArray[i].ContactId, pageId = dataAsArray[i].PageId, Label = dataAsArray[i].Engagement });
    }
}

The prepared data is then loaded into the ML Context within the LoadData method. Features (i.e. what we want to use to make prediction) are identified and DataViews are prepared for both test and train cases.

private (IDataView training, IDataView test) LoadData(MLContext mlContext)
{
    var trainingData = mlContext.Data.LoadFromEnumerable(_trainingData);
    var testData = mlContext.Data.LoadFromEnumerable(_testData);
    var pipeline = mlContext.Transforms.Concatenate("Features", "contactId", "pageId");
    IDataView trainingDataView = pipeline.Fit(trainingData).Transform(trainingData);
    IDataView testDataView = pipeline.Fit(testData).Transform(testData);
    return (trainingDataView, testDataView);
}

The BuildAndTrainModel method uses the MatrixFactorization approach that comes with the Microsoft.ML.Recommender nuget package, to build and train the model (note this is the same one that was identified as the most appropriate during the GUI approach – in the previous post).

private ITransformer BuildAndTrainModel(MLContext mlContext, IDataView trainingDataView)
{
    IEstimator<ITransformer> estimator = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "contactIdEncoded", inputColumnName: "contactId")
    .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "pageIdEncoded", inputColumnName: "pageId"));
    var options = new MatrixFactorizationTrainer.Options
    {
        MatrixColumnIndexColumnName = "contactIdEncoded",
        MatrixRowIndexColumnName = "pageIdEncoded",
        LabelColumnName = "Label",
        NumberOfIterations = 20,
        ApproximationRank = 100
    };
    var trainerEstimator = estimator.Append(mlContext.Recommendation().Trainers.MatrixFactorization(options));
    ITransformer model = trainerEstimator.Fit(trainingDataView);
    return model;
}

The EvaluateModel method then loads in the second set of test data to compare the values that the model predicts against the values provided in the test data set. Values are calculated for both Root Mean Squared Error and also RSquared. The closer these are to zero, the more accurate the model.

private void EvaluateModel(MLContext mlContext, IDataView testDataView, ITransformer model)
{
    var prediction = model.Transform(testDataView);
    var metrics = mlContext.Regression.Evaluate(prediction, labelColumnName: "Label", scoreColumnName: "Score");

    _logger.LogInformation("Root Mean Squared Error : " + metrics.RootMeanSquaredError.ToString());
    _logger.LogInformation("RSquared: " + metrics.RSquared.ToString());
}

The SaveModel method saves the trained model to the file system (i.e. c:\pagerecommender\data\pagerecommender.zip )

 private void SaveModel(MLContext mlContext, DataViewSchema trainingDataViewSchema, ITransformer model)
{
    mlContext.Model.Save(model, trainingDataViewSchema, _modelPath);
}

The last Predict method is the one that will be used to consume the service. This loads the model that was trained and saved to the file system in the previous set of steps. It then creates a PageRating object and passes it into the prediction engine.

public float Predict(string cId, string pgId)
{
    DataViewSchema inputSchema;
    MLContext mlContext = new MLContext();
    ITransformer model = mlContext.Model.Load(_modelPath, out inputSchema);
    var predictionEngine = mlContext.Model.CreatePredictionEngine<PageRating, PageRatingPrediction>(model);
    var input = new PageRating { contactId = cId, pageId = pgId };
    var result = predictionEngine.Predict(input).Score;
    return result;
}

Testing our service

In order to test that the new PageRecommender service is behaving as we would expect, we can call it via a console application. To begin, make sure you have the training data stored on the file system at c:\data\training.csv (see attached file).

Create a new console application and paste the following code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using ml_recommender;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Configuration;
using Microsoft.Extensions.Logging.Console;

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.AddConsole();
});

var logger = loggerFactory.CreateLogger<Program>();

List<PageEngagement> values = File.ReadAllLines("C:\\Data\\training.csv")
                                   .Skip(1)
                                   .Select(v => PageEngagement.FromCsv(v))
                                   .ToList();
var courseRecommender = new PageRecommender(logger);
PageRecommender.Train(values);
var result = pageRecommender.Predict(cId:"11092a8d-255f-0300-0000-0677b4fc5e5e", crseId:"196ab2ea-b0af-4a63-946c-345885285c66");
Console.WriteLine($"Page Recommendation Service predicted an engagement value of: {result}");
Console.WriteLine("Press any key to exit");
Console.ReadKey();

Summary

We now have a service that will analyse a set of data to create a machine learning model. Once the model is trained, we can then call it repeatedly for each contact that we have engagement data for, and ask for a prediction for every page we have available.

From that information, we can order the results and list the pages with the highest predicted engagement values for each contact.

Now that we have a machine learning model to call, we now need to be able to:

  • Page Engagement: Trigger page level goals and events
  • Retrieve those events from XConnect and merge the records to the format expected by our service
  • Store the resulting recommendations

Next up in the series: Page Engagement: Trigger page level goals and events

Leave a Reply

Your email address will not be published. Required fields are marked *