In parts 1 and 2 of this blog series, we’ve seen how to implement an item-similarity model in TensorFlow, and the intuition behind various recommender models. It’s now time to have a high-level view of a recommendation project in the Google Cloud Platform. This will encompass all of our plumbing for the web service, so that it can be up and available on the web. I will outline two possible architectures – one where we deploy and manage TensorFlow ourselves using the Google Kubernetes Engine (GKE) , and the other using the fully-managed Cloud Machine Learning Engine (MLE). You’ll also find how to communicate with the ML engine modules, and how to configure your computational clusters.
Recommendation systems are found under the hood of many popular services and websites. The e-commerce and retail industry use them to increase their sales, the music services provide interesting songs to their listeners, and the news sites rank the daily articles based on their readers interests. If you really think about it, recommendation systems can be used in pretty much every area of daily life. For example, why not automatically recommend better choices to house investors, guide your friends in your hometown without you being around, or suggest which company to apply to if you are looking for a job.
All pretty cool stuff, right!
But, recommendation systems need to be a lot smarter than a plain old vanilla software. In fact, the engine is made up of multiple machine learning modules that aim to rank the items of the interests for the users based on the users preferences and items properties.
In this blog series, you will gain some insight on how recommendation systems work, how you can harness Google Cloud Platform for scalable systems, and the architecture we used when implementing our music recommendation engine on the cloud. This first post will be a light introduction to the overall system, and my follow up articles will subsequently deep dive into each of the machine learning modules, and the tech that powers them.
The TEL group was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company’s profile as a technical thought-leader through blogs, local meet up talks, and conference presentations. Each month, the TEL group gather up all the awesome things that Shine folk have been getting up to in and around the community. Here’s the latest roundup:
*Updated on 16th December 2016 – see below
With the announcement of Amazon Athena at this year’s AWS re-invent conference, I couldn’t help but notice its striking similarity with another rival cloud offering. I’m talking about Google’s BigQuery. Athena is a managed service allowing customers to query objects stored in an S3 bucket. Unlike other AWS offerings like Redshift, you only need to pay for the queries you run. There is no need to manage or pay for infrastructure that you may not be using all the time. All you need to do is define your table schema and reference your files in S3. This works in a similar way to BigQuery’s federated sources which reference files in Google Cloud Storage.
Given this, I thought it would be interesting to compare the two platforms to see how they stack up against each other. I wanted to find out which one is the fastest, which one is more feature rich and which is the most reliable.
Last week I was very lucky to be able to attend the YOW! 2016 Conference in Melbourne. I had never attended a major conference aimed purely at software developers before and when I arrived early on the first day I wasn’t quite sure if I had the right building. Thankfully within 30 seconds of walking in the door I spotted a man wearing a fedora and I knew I had come to the right place!
The conference overall was an extremely well run affair. The speakers were all very good and many were either from high profile companies such as Facebook and Uber or were outright living legends of the industry such as Robert ‘Uncle Bob’ Martin. There were three talks to choose from during each time slot and they covered a wide range of topics. The hardest bit was choosing which talk sounded most interesting and I suffered from severe ‘Fear of Missing Out’ syndrome when making my selections.
I would highly recommend attending to anyone who is looking to gain a better sense of what’s going on in the software industry. Setting aside two whole days to listen to presentations, talk to other developers and generally ruminate about the craft of developing software is a great way to take a step back from the daily grind and spend some time looking at the forest instead of the trees. I picked up a number of things that I’ll be able to take back and directly apply in my day-to-day development.
I’ve summarised one of my favourite talks below:
Shine’s very own Pablo Caif will be rocking the stage at the very first YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery, along with a killer demo of it in action. You can find more details of his talk here.
When you have a Big Data solution that relies upon a high quality, uninterrupted stream of data for it to meet the client’s expectation you need a solution in place that is extremely reliable and has many points of fault tolerance. That all sounds well and good but how exactly does that work in practice?
Let me start by explaining the problem. About 2 years ago our team was asked to spike a streaming service that could stream billions of events per month to Google’s BigQuery. The events were to come from an endpoint on our existing Apache web stack. We would be pushing the events to BigQuery using an application written in PHP. We did exactly this, however, we were finding that requests to BigQuery were taking too long and thus resulted in slow response times for users. So we needed to find a solution to Queue the events before sending them to BigQuery.
With the current move to cloud computing, the need to scale applications presents itself as a challenge for storing data. If you are using a traditional relational database you may find yourself working on a complex policy for distributing your database load across multiple database instances. This solution will often present a lot of problems and probably won’t be great at elastically scaling.
As an alternative you could consider a cloud-based NoSQL database. Over the past few weeks I have been analysing a few such offerings, each of which promises to scale as your application grows, without requiring you to think about how you might distribute the data and load.
The YOW! 2015 Developer Conference in Melbourne took place a few weeks ago, and once again the organisers did a splendid job curating a selection of both international and local speakers (including Shine’s very own Ben Teese). There were also delicious meals and glorious developer fuel (a.k.a coffee) to keep the energy going strong between the amazing talks.
This year’s conference felt like it featured a wider variety of topics compared to previous years; headlining were Mobile development, Lean and Agile, Performance Testing, Software Architecture and Design, Big Data, Cloud Platforms, and DevOps. There was one topic however that took the crown and was presented with an overwhelming sense of urgency and importance: Microservices.
We had talks from big players such as Facebook, Uber, ThoughtWorks and Netflix, each giving an insight on how they are using microservices and how nearly everything they have done is a microservice (Over 1000 services!). It is safe to say that it was this year’s favourite buzz word.