Recommendation systems are found under the hood of many popular services and websites. The e-commerce and retail industry use them to increase their sales, the music services provide interesting songs to their listeners, and the news sites rank the daily articles based on their readers interests. If you really think about it, recommendation systems can be used in pretty much every area of daily life. For example, why not automatically recommend better choices to house investors, guide your friends in your hometown without you being around, or suggest which company to apply to if you are looking for a job.
All pretty cool stuff, right!
But, recommendation systems need to be a lot smarter than a plain old vanilla software. In fact, the engine is made up of multiple machine learning modules that aim to rank the items of the interests for the users based on the users preferences and items properties.
In this blog series, you will gain some insight on how recommendation systems work, how you can harness Google Cloud Platform for scalable systems, and the architecture we used when implementing our music recommendation engine on the cloud. This first post will be a light introduction to the overall system, and my follow up articles will subsequently deep dive into each of the machine learning modules, and the tech that powers them.
Recommendation systems & ML
Recommendation systems are data-oriented modules. We need sufficient relevant data and information for the recommender to make it work. We are also in need of a good (usually a mathematical) model that can emulate the user-item interactions. This is because the interactions might differ from user to user and is not simple enough to follow logical rules. So the common approach would be to try to learn a machine learning model that can model behavioural patterns and generalise them to unprecedented cases.
The recommender can be seen as a query-response module that retrieves a ranked list of relevant or interesting items in response to a query. Here, the query can be seen as predicting the next item the user interacts with, given the user’s previously interacted items, and possibly some other available information about the user.
The above paragraph may be a bit abstract to some, so let me clarify it a bit by talking about our music recommendation engine. For this, the task is to recommend songs (items) that might be of interest to a user whom we know its previously listened songs, and might have some more information like gender, age, demographical information, genres of interest, etc. In addition to the query (user information), some information about the songs are usually available. The task can be seen as predicting the next song the user is willing to listen to, given the user’s previously listened songs. As you’ve already probably noticed, this is not something an explicitly programmed piece of software can solve but instead, a more complicated model is used that learns the interactions and is able to predict the listened songs. That’s where machine learning techniques come handy!
Now, let’s split the process of implementing a recommendation system from the machine learning perspective.
The training phase involves learning and making the recommender model. For this, basically a dataset is available with user-item interactions. The dataset may also appear with users preferences and features, or items characteristics. The training phase will make use of the training dataset, which is queries and their replies, to learn a model that can correctly predict the responses to the query, and is able to generalise the implicit user-item patterns to the queries that are not present in the data. This process is the focus of machine learning specialists and the end goal is to come up with a model that can generalise the user-item interactions to unforeseen queries.
After the training phase is done, a model is obtained which is able to reply to the queries with similar structure as in the training dataset. Note that the queries do not need to be exactly the same as the ones we trained our model on, but should be similar in structure and nature.
Having the trained recommendation in hand, the next step is to reply to the queries. The process of using a learned model to retrieve responses to queries is called the prediction phase. The training phase is usually very time-consuming, but once the model is trained, it can be used as a stand-alone prediction module with the ability to quickly return a list of recommended items to the user.
As we will soon see, there are some simpler models that do not require the training phase. These models can be used as the prediction module, but are usually not as good as the machine learning based models.
So far I’ve talked about the two phases of making a machine learning recommender. In this section, we classify the recommendation computation module into three categories:
- content-based filtering
- collaborative filtering
- hybrid recommendation models
The content-based filtering focuses on the content of the items and will try to generate a user profile based on the information it has about the items that the user has interacted with. For instance, if the genre of listened songs are available, the user profile will try to keep the amount of user’s interest
to each genre. Or, if we have a good representation of the songs, the user profile will try to model the nature of the songs that are most appealing to the user.
For prediction, the items are recommended which are assumed to be the most interesting items to the user. The amount of interest is usually computed by a similarity metric.
For the music recommendation, let’s assume that we have a set of numbers which can model the nature of the song: if it is fast or slow, have vocals or not, is the singer a male or a female, what instruments are used, etc. And let’s also assume that all this information is shown by a set of numbers (a vector of numbers called feature vector). We can then keep the listened songs feature vectors as the user’s profile. For prediction, we can compute the similarity of a song to the previously listened to songs, compute the average and consider it as the similarity/interest metric, and then sort the amount of similarity, and recommend the most similar songs to the previously listened songs.
By looking at the example, it should be obvious that the prediction phase of the content-based filtering requires computing a distance/similarity metric, and a vector of features is required for each item.
Still with me? Great. Then let’s move on shall we!
The content-based filtering is geared towards the item’s features. In comparison, the collaborative filtering concentrates on the user-item interactions. These models may not require content information or features for items and users but instead, concentrate on modeling the rating. Collaborative filtering can compute user similarity between two users by comparing their ratings, and likewise, the item similarity groups items together by how users interact with them.
The collaborative filtering can be seen as filling a user-item matrix. Each row of the matrix corresponds to a user, and each column is related to an item. The index ui of the matrix keeps the rating user u has given to item i. In other configurations it can be 1 if user u has chosen item i, or 0 otherwise. For instance, if user u has listened to song i, the ui element of the matrix will be 1, and if we know that the user has not listened to it, the value will be 0. The collaborative filtering will try to fill the indices of the matrix which are unknown.
By having the user-item matrix, we can consider each row to be the features of the user, and the columns to be the features of the items. These features can be used to compute similarity between two users or items. Intuitively, item-similarity modeling will group together the items that are likely to be listened together, and will recommend them to the user. In contrast, the user-similarity models will find similar users and recommend the listened songs of similar users to the user.
In addition to the user-based and item-based similarity models, there are other collaborative filtering models that fill the user-item similarity by more complicated models. one famous model is called matrix factorization, which I’ll talk about in the upcoming blogs.
The collaborative filtering approaches usually result in better recommendations than the content-based models in practice since they can incorporate cross-personal interactions in the model. However, they suffer from the so-called cold-start problem, which talks about a real scenario where a new user or a new item is added to the dataset. The collaborative filtering approaches have no scheme to deal with new users and never recommend newly added items. This is one of the recent research areas in recommendation models.
The hybrid models make use of both ratings and features to make a recommendation model. These models are the combination of the content-based and collaborative filtering approaches and may make use of the item contents as well as the user-item interactions.
You can obtain a hybrid model by first making pure content-based and collaborative filtering models independently and then combining the results together. More robust approaches unify the two into one model, and may use the user/item features jointly with the interactions.
End-to-End recommendation on Google Cloud Platform
So far, the focus of the blog was about the recommendation models. Now, let’s assume that we have trained our model, and we now have our prediction model ready. We have done part of our job to reach a recommendation service, but there are still a bunch of software engineering and development tasks remaining to obtain a fully-functional service i.e. implementing the boring plumbing stuff!
Now, it is required to connect the dots and complete the software system in order to have a useful service for the end users. The trained recommendation engine is capable of ranking the items with respect to the query, and is the computational core of the system. This should be considered in conjunction with other components of the system to obtain a workable service. The architecture described below is almost similar for all the machine learning services, except that the machine learning model will replace the recommendation:
So typically for a recommendation service, just like almost any other software, we need a backend, DBMS and a frontend. The recommendation system by itself might require other dependencies like a database. In this case, we can assume that the green box above contains other modules inside it.
Google Cloud Platform (GCP) is an ideal congruous environment for such services. It provides serverless and managed services for all the modules above. In my upcoming blogs, I’ll investigate the different options for each building block for our own use case, and will also implement different machine learning recommendation engines on the cloud. One big advantage of Google Cloud is its scalability. The service can easily be scaled upon request, and even better, will scale automagically with regards to amount of service usage and request volumes.
The focus of the blog is machine learning on GCP, so let’s briefly touch on it at a high level before deep diving into the core services in the later blog posts.
Machine Learning on Google Cloud Platform
GCP allows usage of some general Google ML services through available APIs like the Cloud Vision API for object recognition in images, the Speech API for voice to speech conversion, etc. It is also flexible enough to be used by machine learning specialists to implement their own models and make use of Google’s hardware facilities for training and prediction of their own models. The cloud machine learning makes use of the open source TensorFlow library and allows scalability of the machine learning prediction service with a rise in requests, and is integrable with big data services present on GCP like BigQuery.
Machine learning on GCP is based on the Google’s TensorFlow library. The Google Brain Team in Google’s machine intelligence team originally developed TensorFlow as an open source library for deep learning, but the library can be harnessed for other machine learning models. TensorFlow can make use of more than one CPU and GPU, and can be executed locally, on the cloud, or on mobile devices.
A model in TensorFlow is a graph with nodes being mathematical operations. The N-dimensional data (tensors) can flow through the graph and the mathematical operations defined by different nodes are applied to the them. The core library is in C++, but the library is typically used through Python wrappers.
Cloud Machine Learning Engine
The Machine Learning Engine (MLE) is the Google’s managed service to create machine learning models for any data. The MLE can be used for training a TensorFlow machine learning model and provides prediction services for a trained model through REST web services. After the model is trained, the model will be immediately available for prediction, with the ability to scale with an increase in requests. Another great capability of MLE is its integrability with Cloud Dataflow, BigQuery and Cloud Storage, etc. This comes with the merit of making machine learning with big data possible. Woo hoo!
In order to use the machine learning services on GCP, the model is implemented in TensorFlow, and the data is kept accessible to the MLE on the cloud. It is then easy to train the model on the cloud or communicate to the learned model through rest web services for prediction.
In the upcoming blogs we deep dive into implementing a music recommendation web service on the cloud. The purpose of the recommendation is to recommend unlistened songs among our song dataset to a user based on the songs he selects as listened. For this end, we make use of a publicly available research dataset called million songs dataset.
Million Songs Dataset (MSD)
I made use of the publicly available million songs dataset (MSD) for the music recommendation service on the cloud. The MSD contains a million different songs from different artists. It also comes with some extra information about the songs like the corresponding album, year, popularity, loudness, danceability, timbre, etc. The timbre is of interest since they are the representative of the songs’ musical features, the items contents, remember? Of course you do!
In addition to the MSD library, a dataset of listened songs is available as a Kaggle competition. The dataset consists of the songs that are available in MSD and are listened by at least one user. The listened-songs corresponds to 110k users and only encompass a subset of the whole dataset. This dataset can be used specifically for collaborative filtering models, and that is how we will make use of it through the learning process of our model in the relevant models.
GCP services used for music recommendation
In our implementation of the music recommendation, we make use of several different services on GCP for different purposes. The tech stack consists of:
- Google Machine Learning Engine (MLE) and TensorFlow (TF): for making and learning the model and providing prediction services from the learned model for a collaborative filtering recommender
- Google Kubernetes Engine (GKE) and TensorFlow Serving (TFS): for making a user-managed model to work as a conent-based filtering model
- Google Compute Engine (GCE): for a service to call TensorFlow serving
- Google Dataflow and TensorFlow Transform (TFT): for preparing and preprocessing the data as training dataset
- Google App Engine (GAE): for backend and frontend for our recommendation web service
- Google Cloud SQL: for keeping the musics data to be represented to the users on the web service
- Google Cloud Storage: for keeping the data required by different engines above
Although there is no necessity to use all these technologies in a machine learning project on GCP, we have tried to use different options for different versions of the implementation of this service. This is due to the fact that I wanted to provide the options to the readers and talk about the cons and pros of each technology.
Almost done. Bear with me for just a little longer, please!
Finally, let’s see what you should expect in the upcoming parts of the series.
Upcoming posts in the series
We will focus our attention on implementing a music recommendation on the Google Cloud. Throughout the upcoming blogs we will deep dive into the architecture and technologies that we used, different approaches to target our problem from different perspectives, advantages/disadvantages/pitfalls, and the machine learning models we used to implement them.
Here’s an overview of what I’ll cover later on:
- Similarity-Based Recommendation: the content-based model implementation with focus on the architecture of our service after prediction, and usage of GKE and TensorFlow serving for our recommender model
- Matrix Factorization Model: the collaborative filtering model implementation and training, with focus on how to prepare the data (TFT and data flow), train the model (MLE) and communicate to the MLE
- Deep Auto-Encoder/Decoder model: a hybrid model with deep learning feature representation with the focus on a better model and resolution of the cold start problem.
- Guilty Pleasures: another model with some fun to incorporate diversity into the model and find inherent patterns in the data with the purpose of recommending songs far from the users commonly listened genres.
So stay tuned, and have fun reading….and learning (pun intended)!