A Simple Recommender Systems - Collaborative Network Library
Introduction
Do you find the "customers that bought this item also bought these items" feature of sites such as Amazon.com useful? Would you like to add this capability to your e-commerce web site in order to increase cross-sell opportunities? Perhaps you are just interested in recommending pages to visitors browsing your site. Regardless of the specific type of item you want to recommend or cross-sell, the technique can be the same. There are many algorithms that could be leveraged to implement a recommender system. In this article I will describe one such implementation based on a straight-forward graph structure and a trivial algorithm.
Armed with the content of this article and the downloadable source code you will be able infuse a simple recommender algorithm into your web site.
This article is organized as follows:
Background information (a quick explanation of the theory of recommender systems)
Library Overview
How to use the library
Implementation Details
Closing Remarks
References
Background
Recommending systems are the bread and butter of large e-commerce web sites. Potential customers are presented with highly relevant product recommendations based on their demonstrated likes; thus, increasing the probability of purchase. Customers purchasing one or more items are presented with like-items increasing the likelihood of cross-selling. Returning customers are retained by the quality of recommendations based on their previous purchases. In general, the goal of any recommendation system is to present user with a highly relevant set of items.
There are two general types of algorithms for recommendation. One is based on the similarity between the currently active user and previous users. The similarity is usually based on the items each user demonstrated interest (or purchased) on. The other type is based on the similarity of items selected by the current user to other items. Usually described as user-based and item-based collaborative filtering, respectively, these algorithms differ in the data collected and the associations formed between the data items.
Both type of systems can be further classified as either memory- or model-based collaborative filters. The memory-based approach relies on the entire database of observations being in memory when a recommendation is requested. The request is then satisfied by searching for neighbors that exhibit similar behavior and then combining the preferences of these neighbors into a set of ranked items. The model-based systems relies on building a model offline (usually by means of a machine learning algorithm) then using the learned model to classify the user and recommend relevant items.
The Collaborative Network Library described in this article is a simplistic item-based, memory-based, collaborative filtering algorithm. It is not a probabilistic approach and is solely based on observed associations between pairs of items; thus, it does not handle unobserved items or features. This means that if the user selects an item that has not been a selection of any previous users, then the system will not be able to produce a recommendation. For an item to be part of a recommendation it must have been associated with other items by one or more previous users.
Library Overview
There are two major aspects to a memory-based collaborative filter. The first involves updating the in-memory representation user interaction with your products or site pages. The second is to obtain a recommendation. This is no different in the case of the Collaborative Network Library described in this article. Both steps are achieve via a single class. The primary interaction between your code and the library will be via the ItemGraphSingleton class in the Platypus.Collaborator namespace. The action of updating the item-based graph is accomplished by invoking the void AddLink( string strSource, string strTarget) method, where strSource is the previously viewed product and strTarget is the newly requested/viewed/selected item.
In the case of the Collaborative Network Library, the in-memory data collected is based on the
sequence in which items are viewed or purchased. In order to capture any meaningful association
a user must view or purchase more than one item. However, in order to mitigate this requirement
you could always provide a dummy start point for all associations. For example, if a user visits
your site and views or purchases a single item name 'foo product', you could create a link
between a dummy item (say, 'home') and the product purchased. This allows the library to
recommend items from the beginning of any user session.
Obtaining a recommendation is done via the Item[] NextItems( string strItem, int nTopN ) method. It returns an array of, at most nTopN Item objects in descending ranking order. These are instance methods, so they must be accessed via the static ItemGraphSingleton.GetInstance() method. These are the two primary operations on the library.
In addition to the primary tasks of updating the graph and retrieving recommendations one must also persist the graph to a durable media since all updates are in-memory. This can be achieved in many different ways. For example, we could persist the graph to an SQL Server database or to disk. The current implementation (Version 1.0) supports persistance to disk using the .Net binary serialization mechanism. For this purpose the utility class ItemGraphSerializer is provided. This class has two overloaded serialization and deserialization methods. More details on this class are provided in the next section, but suffice it to say here that these methods can be invoked at regular time intervals in order to persist the latest state of the graph to a file on disk.
How to use the library
Now that we have covered the general, let's get into the specifics. In this section I will show you how to incorporate the Collaborative Network Library into your web site in a minimally intrusive manner.
Let's start by the initialization code. In the context of a web site you might want to load a previously serialized version of the graph in the application startup code. This can be done in the Global.Application_Start method as such:
using Platypus.Collaborator;
using System.IO;
...
SerializationResult sr;
fileName = Server.MapPath(@"\collab_graph.graph");
if( File.Exists( fileName ) )
{
sr = Platypus.Collaborator.ItemGraphSerializer.Deserialize(fileName);
if( !sr.Succeeded )
{
// warning, not necessarily an error
// log sr.ErrorMessage
}
}
Correspondingly, one could serialize the graph to disk on the application end event:
using Platypus.Collaborator;
using System.IO;
...
protected void Application_End(Object sender, EventArgs e)
{
fileName = Server.MapPath(@"\collab_graph.graph");
Platypus.Collaborator.ItemGraphSerializer.Serialize( fileName );
}
With this code in place the graph is serialized to disk everytime the application shuts down and deserialized on start up. Of course, you could decide on a different serialization schedule; for example, when each session terminates.
With serialization out of the way, we can move to the interesting part of the library - obtaining recommendations and updating the graph. In the General Steps section I briefly described the two methods involved in this process. Using these two methods in your application is simple. The AddLink(string strSource, string strTarget) method takes two parameters, a source and a target item. The source is the item previously selected by the current user. The target is the item the user has just requested. By calling this method with these two parameters we are instructing the graph library to update the directed association from item strSource to item strTarget, or to create it if the association has not been observed before. This implies that you are maintaining session information about the current user, a common practice in e-commerce sites. However you are maintaining the state of the current user, you will need to identify the sequence in which items are selected.
In a typical scenario, you will respond to a user's choice to select an item via some server-side event handler. The action of updating the graph and obtaining a recommendation can be combined in such a handler. Let's assume that you are using a DataGrid to display recommended items and that the user's previous selection is stored in session. The following code demonstrate how to use the AddLink() method and obtain a set of recommendations.
protected void UpdateGraphAndRecommendNextItems( string strPrev, string strNext )
{
//
// Update the graph
//
ItemGraphSingleton.GetInstance().AddLink( strPrev, strNext );
//
// Get a recommendation of at most 5 items
//
Item[] items = ItemGraphSingleton.GetInstance().NextItems(strNext , 5 );
//
// Bind the recommended items to the DataGrid that displays them.
//
dgRecommendedItems.DataSource = items;
dgRecommendedItems.DataBind();
}
And that's all there is to it.
Implementation Details
So far I have given you a broad overview of collaborative filtering and a bird's eye view of a simple implementation, namely the Collaborative Network Libray. In this section I will go into inner workings of the library and the assumptions I made in how it would be used.
First, let us look at the static structure of the libray. The following UML diagram depicts the relationship between the classes in the library.
The first thing to notice is that there is an ItemGraph and an ItemGraphSingleton class. The singleton is simple wrapper around the ItemGraph class. It simply enforces a single instance of the ItemGraph to exist in your application domain. You could, if you chose to, instantiate an ItemGraph object and manage its lifetime yourself. The singleton is provided for convinience.
Once the concept of generics is incorporated in the .Net framework (version 2.0 due out this year) this design could be improved by using a generic singleton template whose type is ItemGraph [1].
One important aspect of the ItemGraph class is that since its instance is shared across multiple sessions its operations must be thread-safe. To this end you will see that the void AddLink() method synchronizes the process of updating the internal structure of the graph.
Internally, the graph is represented as an adjacency list [3]. The ItemGraph object holds a collection of
Item objects added via the AddLink() method. As new items are encountered, they are
added to this collection. The collection is a strongly-typed and is based on the
System.Collection.Collection class. So how exactly is this a graph? Each Item object
has a collected of DirecteEdge objects. A DirectedEdge object contains a reference to another Item
as well as a weight. As the name implies each DirectedEdge object represents a directional link
between the containing Item and the Item referenced by the DirectedEdge (property of type
Item named Target). The Weigth property represents the
number of time an association between these two items has been added. In the context of an
e-commerce site this could represent the number of times customers have shown interest in the
source item followed by the target item. The following figure demonstrate graphically a
hypothetical instance of the graph.
In this hypothetical graph the first item in the ItemGraph collection of items would be Item A.
This Item object would have a collection of three DirectedEdge objects pointing to Items C, G and E,
with weights 3, 2, 3 respectively.
The current version has a ver simplistic approach for recommending items. It finds the requested Item
in the Graph and then enumerates its outward links, up to the requested number. A potentially better
approach, specially early in the evolution of the graph, would be to implement a depth traversal
algorithm to find the 'heaviest' path with the number of requested items. For example, if the requested
item has only 2 outward links and the request is for 4 the algorithm would inspect the children of
the 2 outward links and find the heaviest of them. This approach can include traversal cost that
reduces the weight of outbound links based on the number of steps from the requested node. This
adjusted weight would represent the indirect association between the target in the link and the
requested node based on their separation.
The remaining graph-related aspects of the library are fairly simple. A cursory inspection of the
source code would yield a much better understanding than the written word.
A brief description of the serialization code is in order. The class ItemGraphSerializer
encapsulates the hydration and dehydration of the graph to and from a file stream. This process
leverages the .Net binary serialization mechanism. The ItemGraphSerializer.Serialize()
method takes a file name and a reference to an ItemGraph object. If you are interacting with the
graph via the ItemGraphSingleton class, then it provides a public property Graph to
access its private ItemGraph instance. When this method is invoked a file stream
(System.IO.FileStream)is opened with FileMode.Create and
FileAccess.Write. Then a BinaryFormatter is used to serialize the ItemGraph object
to this stream. Deserialization of a binary ItemGraph stream is similarly performed using
a BinaryFormatter via a FileStream. The ItemGraphSerializer.Deserialize()
supports this operation. This method takes a reference to an ItemGraph variable to which is deserializes
the content of the file. A SerializationResult structure is returned which specifies the
results of this process. If a failure was encountered, the structure's Succeeded boolean
member will be set to false, otherwise, true.
Closing Remarks
In this article I have introduced a simple collaborative filtering recommending system using
a graph-centric design. It is simple to use and does not require any significant effort to
integrate into a .Net web site. It offers a file-based persistance model which can be easily
modified to be SQL Server based.
This library can be used to recommend product in an e-commerce site or to help users browse
relavant pages or content on a site. It generically evolves weighted relationships between
pairs of items forming a cyclical associative graph.
A simple recommendation algorithm is included which simply returns the items associated or
referenced from the selected item in the graph.
Future improvements to the recommending algorithm were alluded to in the previous section.
A subsequent release of this product will include code to write out graph visualization
data. To learn more about this future enhancement visit our site at
http://www.platypus-tech.com.
References
[1] More Effective C++: 35 New Ways to Improve Your Programs and Designs, Scott Meyers
[2] Item-based Collaborative Filtering Recommendation Algorithms; Sarwar et al.
[3] Introduction To Algorithms; Cormen, Leiserson, and Rivest