Getting ya music recommendation groove on with Google Cloud Platform! Part 2

Getting ya music recommendation groove on with Google Cloud Platform! Part 2


In part 1, we learnt about recommendation engines in general, and looked at ways to implement a service using the Google Cloud Platform (GCP). In part 2 of the blog series, we are getting our hands dirty on the item-similarity model and TensorFlow implementation of it.

This is our first technical blog of the series. Here, I deep dive into the data processing step, the recommendation service, and some hints on how to optimise the code to have real-time responses. You should expect to know how to build a simple item-similarity recommender engine by the end of this blog.

So let’s get the party started!

Songs data processing

We want to recommend songs that are like other songs, so to build our item-similarity recommendation engine we will need a database of song information.


The Million Songs Dataset (MSD) contains a variety of attributes for each song. The nature of some of these are informative and are suitable for visualisation. For example, the song name, artist, album, release year, etc. Also, some other existing information about the songs relate to the musical and sound aspects of the songs. Among the available attributes of the songs in this group, timbre might be the most useful for us. It is a good representation of the content of a song, and it seems it is what allows us to distinguish two musical instruments when they play the same note and at the same amplitude. So we now found the content features for our songs, right? Not quite! The issue is that the timbres are a stream (sequence) of numbers, and the size of the sequence differs from song to song.


Timbres for a sample song

So in order to have a representation of the songs, we preprocess the songs timbres in a way that we get a fixed-size vector of numbers in the end. The timbres of the songs are of the size (12 x segments) with 12 being the timbre dimensions and segment being different from song to song which equals the number of segments on which the timbres are computed. For this item-similarity model, we will use a feature of size 90,


which concatenates the average of each timbre dimension (12 features) and the diagonal and lower triangle of the covariance matrix (78 features) which shows the variance of each dimension (diagonal elements of the matrix) and covariance of each pair of the timbre dimensions. This sums to 90 elements for feature vector. Note that these features are used in several other works, e.g. to predict songs release year, etc.

After reading the hdf5 files in MSD dataset, the timbres averages and covariance matrix are computed for each song. Then, the features are saved into a CSV file. The CSV file of features is copied to GCS to be ready for the item-similarity recommendation computation.

Similarity service and  TensorFlow


So far, we extracted the songs content features. The remaining part is the core of the system, the recommender engine.

For our first recommendation engine, an easy similarity model is taken into account here. This computes the similarity of the songs with each and every listened song, computes the average of similarities and returns the most similar ones as the recommended songs.

Item-similarity computation

The similarities are computed by cosine similarity. Back to high school math, the cosine similarity is a value between -1 and 1 and is the cosine of the angle between two vectors. If the two vectors are pointing towards the same direction, their cosine similarity will be 1. If they are in opposite direction the similarity will be -1 and if they are perpendicular, the value of similarity will be 0. Note that cosine similarity is a better similarity metric than let’s say euclidian distance when the dimensionality of the space is large. The cosine similarity between two vectors A and B is:

sim(\vec{A},\vec{B}) = \frac{\vec{A}.\vec{B}}{|\vec{A}|.|\vec{B}|} 

Feature matrix row-scaling and faster similarity computation

We can use some math and linear algebra to optimise the computation of similarities. It is always a good idea to compute multiplications in matrix form as long as we can keep them in memory. The magnitudes of feature vectors are fixed and are present in all the similarity computations.


So in order to make the cosine similarity computation faster, we can divide the feature vector elements by the feature vector magnitude to get the scaled feature vectors. This process is called feature scaling or normalisation. We keep and work with the scaled vectors instead of the original feature vectors. If we are working with the scaled vectors, the inner product of two scaled features gives us the similarity. As we have computed and scaled the vectors once, we don’t need to recompute the magnitudes again so the similarity computation will require less floating point operations. So as our first optimisation, we first scale all the feature vectors and keep them. Then, we compute the similarities on top of the scaled feature vectors.

Song recommendation vectorisation

So far we made our decision to use cosine similarity and saw that we can compute it faster by working directly on the normalised feature vectors instead of the feature vectors themselves. Having this in hand, the next step is to find the recommended songs; It is simple, you basically compute the similarity of every single song in the dataset with each and every listened song, compute their average and return the id of the songs that have the highest similarity. It makes sense, doesn’t it? Yes if you are happy to wait 10 seconds to get a response!


Mission failed. But don’t give up!

But wait a sec! This means that we need to write a nested loop to compute the similarity of each listened song with every single existing song (1 million songs), and then, sum the similarities for each song in the dataset. Then, sort them and return the index of the first top K. If we write everything in this naive way, we need a few seconds to respond to the requests. So a more efficient way of computation is required. This is where linear algebra and matrix multiplication come handy.

We do all these computations in the fastest way possible by collecting everything into matrices. For the feature vectors, we gather the feature vectors into a matrix called feature matrix F. Each row of the matrix corresponds to a song and will encompass the scaled feature vector of the song.

F = \begin{bmatrix} \hat{f}_1 \\ \hat{f}_2 \\ \vdots \\ \hat{f}_n \end{bmatrix} = \begin{bmatrix} \frac{\vec{f}_{1}}{|\vec{f}_{1}|} \\\frac{\vec{f}_{2}}{|\vec{f}_{2}|} \\ \vdots \\ \frac{\vec{f}_{n}}{|\vec{f}_{n}|} \end{bmatrix}   

Matrix is loaded and computed once and is kept in the memory when the service is loaded. Additionally, for each request, a matrix L is constructed using the scaled feature vectors of the listened songs. If the indices of the listened songs are (id_1, id_2, \cdots, id_i), then the matrix L is as follows:

L =\begin{bmatrix} \hat{f}_{id_1} \\ \hat{f}_{id_2} \\ \vdots \\ \hat{f}_{id_i} \end{bmatrix}

Having these two matrices, we can compute the similarities by first multiplying F with the transpose matrix of L, and then summing (or averaging) over each row of the matrix. This will result in the similarities for all songs.

similarities =sum_{rows}( F.L^T )

The next step is to return back the song ids with top k similarity values. Algorithmically, it is much faster to find the top k elements in a list than to sort and pick the first k ones. We will also make use of this fact in our implementation.

So to recap, the service will load the features from CSV and scales each feature vector, generates the feature matrix and keeps it in memory. Then, whenever a new request is received, it will collect the features of the listened songs, and generates the L matrix. Then, it computes the similarities for all the 1Million songs and returns the top k ones (which are not in the set of listened songs).

Also, please note that the process of scaling the vectors is done once. It is possible to do this process and save the features as a CSV file and load this feature into the model or to do feature scaling at the time of reading the data. We’ve written our model to be general enough to work with any CSV file, and as this process is performed once, it is safer to do the scaling regardless of the whether the features are scaled before or not. This will guarantee that the computation of similarities are correct and will work with the features that are saved as CSV.

Recommender service

So now you know how the item similarity works. The next step is simply to go ahead and implement the item similarity.

The service is written in python and TensorFlow. Note that the computation of item similarities can be computed in pure python and numpy, but using TensorFlow allows us to use very efficient matrix/vector operators (and GPUs for it if you like) and the final results are much faster. So we use TensorFlow to do all the computation. In fact, when using TensorFlow with the matrix form, we are able to compute all the similarities for all the 1 Million songs, and return the top most recommended songs in ~0.1 seconds, even when using CPUs, so it is worth it, isn’t it?

So let’s look into the process of preparing the TensorFlow model.

TensorFlow model preparation

Some TensorFlow basics


The models in TensorFlow are represented by a computation graph, with nodes being computational or mathematical operators and the edges showing where the inputs of the operation come from and where the outputs after applying the mathematical operator should go to. The operators in TensorFlow are differentiable (meaning you should be able to compute the gradients of the output with respect to the operator inputs), and that is why you sometimes do not find an operator in TensorFlow which corresponds to an easy-to-do task in programming.

The inputs and outputs of an operation are zero or more tensors (N-dimensional arrays). The inputs of the model are also provided to the graph as tensors. The tensors will go through the operators in the graph one after the other until the outputs of the graph are obtained. We can directly tell the graph if a tensor is a learnable parameter and/or is part of the model, or whether it is one of the inputs and should be provided each time we want to run the graph. The parameters of the model are defined by the Variable class.  This is how you can define a Variable tensor for a model.

variable_param1 = tf.Variable(initial_value = initial_value_np_array, dtype=tf.float32, name='variable_param1')

Another type of tensors are the ones that are directly fed into the graph, i.e. the input tensors. These are defined by the Placeholder operator.

placeholder_param1 = tf.placeholder(dtype=tf.float32, shape=(dim1,dim2), name='placeholder_param1')

As you can see in the syntax above, the placeholder does not receive any input at the time of declaration. This is because there are two phases in using a TensorFlow model. The construction phase is when we introduce the graph, and the runtime phase is to execute the graph using the inputs provided at runtime. As the placeholder introduces the input of the graph, it receives the values in the runtime phase. In the construction phase, it is sufficient to let the graph know that a parameter is a placeholder. If a parameter is declared as a placeholder, the graph will request the value of it to be fed at runtime.

A graph is typically declared by defining the operators and their inputs and outputs. For example, we can construct a small graph to add the elements of two tensors as follows.

param1 = tf.placeholder(dtype=tf.float32)
param2 = tf.placeholder(dtype=tf.float32)
summation_results = tf.add(param1,param2)

In order to run the model and do the calculations (runtime), a graph must be launched in a session.

session = tf.Session()

The session will provide methods to load and execute the graph. After the model is executed, we can receive the output of the graph (or any parameter inside the graph) in numpy arrays.

np_param_val =, feed_dict={placeholder_param: np_input})

The first parameter is the output that we expect to retrieve. The values of the placeholders are provided by the feed_dict parameter.

np_param_val =, feed_dict={placeholder_param: np_input})

Now back to our small summation graph, we can run our graph as below.

with tf.Session() as sess:
    summation_results_np =, feed_dict={param1 : [[1.0, 2.0], [3.0, 4.0]] , param2:[[1.0,1.0], [1.0,1.0]]})

Running this code will return

array([[ 2., 3.], [ 4., 5.]], dtype=float32)

Recommender model in TensorFlow

Now we have enough understanding of the TensorFlow process. Let’s now look into the implementation of our item-similarity TensorFlow model.

import tensorflow as tf
np_features_mat = load_feature_file_and_scale(songs_features_file)
features_tensor = tf.Variable(initial_value=np_features_mat, dtype=tf.float32, name = 'features')
no_recom_songs = tf.placeholder(dtype = tf.int32 , shape=() )
user_songs_tensor = tf.placeholder(dtype = tf.int32 , shape=[None])
######creating the model
#L matrix
L_transpose_tensor = tf.transpose(tf.nn.embedding_lookup(features_tensor, user_songs_tensor))
# similarity computation
similarities = tf.transpose(tf.reduce_sum(tf.matmul(features_tensor,L_transpose_tensor), axis=1))
#change values of the similarities for listened songs to -1 (so that they won't appear in top k recommended songs)
similarities = tf.scatter_update(similarities, user_songs_tensor, tf.fill([tf.size(user_songs_tensor)],-1.0))
#return top k recommendations recommendations (first dimension corresponds to the value and second dimension corresponds to indices. We want indices, that is why [1] in the end.
recommended_songs = tf.nn.top_k(similarities, k= no_recom_songs )[1]
#end of model creation

The first method in the code snippet reads the CSV file of features row by row, scales them and saves them into a numpy 2D array. The process of scaling rows of the feature matrix is written in numpy since we don’t want to make it part of the graph. This is because we do not want this computation to be done per request, and it should be done once and the results are further used for query responses. But if you like, you can do this scaling in TensorFlow just like below:

tf_features_mat = tf.placeholder('float' , shape=(no_songs, noFeatures))
features_tensor = tf.nn.l2_normalize(tf_features_mat, dim=1)

The next step is to introduce the variables and inputs to the graph. The features are introduced as the variable parameter of the TensorFlow graph. This is because the features are considered as the internal parameter of the graph and we want to consider them as a variable inside the model, not as an input parameter to the model.  Also, the request inputs, i.e. listened song ids and number of songs to recommend are introduced as placeholder variables, as these are the input to the graph.

We then make the matrix L^T by collecting the rows of the features tensor with the tf.nn.embedding_lookup and transposing the matrix.  the embedding_lookup receives a tensor and the indices as input and will pick the rows from the matrix tensor corresponding to the indices and returns a new tensor with the picked rows.

After the matrix preparation, computation of the inner product of the features_tensor and the transpose of L matrix is performed. We then compute the sum of rows by calling the tf.reduce_sum operator. Then tf.scatter_update is called to convert the similarity of the listened songs to -1. This is because we will return the top K songs in the next step and we don’t want the listened songs to be among the recommended songs. Changing the similarity values for the listened songs to -1 will ensure that they are not returned as the recommended songs.

The last line denotes to finding top_k similarities by calling tf.nn.top_k and returning the indices of top k songs as the output of the model. This will end our TensorFlow graph construction.

Now that we have constructed the model, we should prepare it for deployment in the cloud. As you will see in the next blog, we will use TensorFlow serving for this. In order to prepare our model to be served, we should save the model, and introduce the saved model to TensorFlow Serving. This is how you can define the input and output signature for TensorFlow Serving and save your model.

from tensorflow.python.saved_model import builder as saved_model_builder
builder = saved_model_builder.SavedModelBuilder(export_path)
predict_input_listened_songs = tf.saved_model.utils.build_tensor_info(user_songs_tensor)
predict_input_no_recommended_songs = tf.saved_model.utils.build_tensor_info(no_recom_songs)
predict_outputs = tf.saved_model.utils.build_tensor_info(recommended_songs)
prediction_signature = (tf.saved_model.signature_def_utils.build_signature_def(
        'listened_songs': predict_input_listened_songs,
        'no_recom_songs': predict_input_no_recommended_songs
        'recom_songs_ids': predict_outputs
legacy_init_op =, name='legacy_init_op')

Here we make a model saver to be able to save our model. We also need to introduce what is given to the model as the input and what is considered as output.

Running the above codes will generate the computation graph in TensorFlow and saves it for TensorFlow Serving. The next step is to prepare a cluster in GCP and ask the cluster to have TensorFlow Serving as a service. We will discuss our architecture on GCP, and the process of preparing our TensorFlow serving in part 3 of our blog series.

Item similarity at a glance

I am glad you were stubborn enough to come this far. Let’s wrap up.

In this blog, we talked about the item-similarity model and its implementation in TensorFlow. The model is basically computation of similarity between the listened songs and the available songs in our dataset based on the features of the songs. Although the model is very simple (and doesn’t involve machine learning if you like!), it still appears to work well in practice and in real-time, provided we implement the algorithm efficiently.

We first looked into the process of generating our features for the content-based item similarity model.  We observed that we require features to be able to compare the songs. Then, some details about how to exploit TensorFlow for our means were provided and the cuts and pieces and code snippets were given.

This was the new start to the technical world of recommendation in our blog series. In part 3 of the blog series, you’ll first see the architecture of the system on the GCP, and how to serve our model in TensorFlow Serving. Then, In later blogs, you’ll get your hands dirty on the actual machine learning models and how they are implemented in more detail.

So stay tuned!
No Comments

Leave a Reply