Forecasting weather with BigQuery ML

Weather forecast is a complicated process. If you live in an area with lots of oscillation in weather like us in Melbourne, you should always give some chance for the weather to be different from what you see on websites.

The weather is typically forecasted by first gathering a lot of information about the atmosphere, humidity, wind, etc. and then relying on our atmospheric knowledge and a physical model to articulate changes in the near future. But due to our limited understanding of the physical model and the chaotic nature of the atmosphere, it might be unreliable.

Instead of the common approach for this, here we try to scrutinise the idea of entrusting a machine learning model for this purpose. We expect the model to look at the historical data and get a feeling of how the temperature will change in near future, let’s say tomorrow.

Getting Started with Azure Kubernetes Service

We were recently tasked with delivering a proof of concept for a large retailer to help them easily scale their Virtual Machines (VMs) and Docker containers in the Azure cloud. This meant we had to familiarise ourselves with Azure’s Kubernetes Service and we thought it would be a good opportunity to share our findings.

Migrating Blobstore between Projects

What is Blobstore? What is a Blob?

Like horse-drawn carriages, video rental stores, and scurvy, Blobstore is a leftover from an earlier time. It is a storage option on Google Cloud Platform (GCP) that stores objects called blobs and associates each blob with a key. It is used with Google App Engine services and allows applications to serve or get files based on an HTTP connection.

Blobstore is now superseded by Google Cloud Storage (GCS) but its usage is still possible with the actual storage in GCS, the same upload behaviour and minimal changes to the app.

In contrast to other modules in GCP, migration of Blobstore from one project to another is not straightforward. In this blog, we will investigate this migration.

Using gcloud Formats and Projections in the Google Cloud Platform

Recently, I was hunting around the internet, looking for an easy way to extract an attribute of GCP resource to cross-reference while creating another resource in gcloud. I had reserved a static IP address and I wanted to use it’s IP address as the external address of a VM instance. I learnt that such a simple operation was indeed tricky, at least up until some time ago. Here’s my journey and welcome aboard!

Wrangling CloudFormation with Sceptre

Anyone who has delved into CloudFormation knows its power for describing and managing your cloud infrastructure within AWS. Likewise, if you’ve spent any time writing CloudFormation templates of any significance you’ll know that you’ll spend a lot of time duplicating sections of templates.

We always aim to reduce repetition in code so this can be a bit grating.

In this post, I hope to explore a few technologies that can help with this, primarily a tool called Sceptre from Cloudreach.

Getting ya music recommendation groove on, this time on Amazon Web Services

In this blog series so far, I have presented the concepts behind a music recommendation engine, a music recommendation model for TensorFlow, and a GCP architecture to make it accessible via the web. The end result has been an ML model wrapped in a stand-alone service to give you predictions on-demand.

Before diving further into implementing more complicated ML models, I thought it would first be worth looking into how we could deploy our TensorFlow model into AWS. After some investigation, I’ve concluded that the better way is to use Lambda functions. In this post, I’ll explain why that’s the case, how you can do it, and an interesting pain point you have to keep in mind.

Let’s break the new ground!

headphones-man-music-374777.jpg

Introducing column based partitioning in BigQuery

Some background

When we started using Google BigQuery – almost five years ago now – it didn’t have any partitioning functionality built into it.  Heck, queries cost $20 p/TB back then too for goodness’ sake!  To compensate for this lack of functionality and to save costs, we had to manually shard our tables using the well known _YYYYMMDD suffix pattern just like everyone else.  This works fine, but it’s quite cumbersome, has some hard limits, and your SQL can quickly becomes unruly.

Then about a year ago, the BigQuery team released ingestion time partitioning.  This allowed users to partition tables based on the load/arrival time of the data, or by explicitly stating the partition to load the data into (using the $ syntax).  By using the _PARTITIONTIME pseudo-column, users were more easily able to craft their SQL, and save costs by only addressing the necessary partition(s).  It was a major milestone for the BigQuery engineering team, and we were quick to adopt it into our data pipelines.  We rejoiced and gave each other a lot of high-fives.

Google Cloud Community Conference 2018

As a co-organizer for GDG Cloud Melbourne, I was recently invited to the Google Cloud Developer Community conference in Sunnyvale, California. It covered meetup organization strategies and product roadmaps, and was also a great opportunity to network with fellow organizers and Google Developer Experts (GDEs) from around the world.  Attending were 68 community organizers, 50 GDEs and 9 open source contributors from a total of 37 countries.

I would have to say it was the most social conference I have ever attended. There were a lot of opportunities to meet with people from a wide range of backgrounds. I also got many valuable insights into how I could better run our meetup and better make use of Google products. In this post I’ll talk about what we got up to over the two days.

Thoughts on the ‘AWS Certified SysOps Administrator – Associate’ exam

A couple of weeks ago was a significant milestone in my 14-year IT career: I actually sat a certification exam. In this case, it was the AWS Certified SysOps Administrator – Associate Exam.

Despite some trepidation during my preparation for the exam, on the day I found it quite straightforward and came out with a pass mark. In this post I’m going to share some of my thoughts and notes in the hope that it will help others preparing to sit this exam.