Introducing column based partitioning in BigQuery

Some background

When we started using Google BigQuery – almost five years ago now – it didn’t have any partitioning functionality built into it.  Heck, queries cost $20 p/TB back then too for goodness’ sake!  To compensate for this lack of functionality and to save costs, we had to manually shard our tables using the well known _YYYYMMDD suffix pattern just like everyone else.  This works fine, but it’s quite cumbersome, has some hard limits, and your SQL can quickly becomes unruly.

Then about a year ago, the BigQuery team released ingestion time partitioning.  This allowed users to partition tables based on the load/arrival time of the data, or by explicitly stating the partition to load the data into (using the $ syntax).  By using the _PARTITIONTIME pseudo-column, users were more easily able to craft their SQL, and save costs by only addressing the necessary partition(s).  It was a major milestone for the BigQuery engineering team, and we were quick to adopt it into our data pipelines.  We rejoiced and gave each other a lot of high-fives.

Google Cloud Community Conference 2018

As a co-organizer for GDG Cloud Melbourne, I was recently invited to the Google Cloud Developer Community conference in Sunnyvale, California. It covered meetup organization strategies and product roadmaps, and was also a great opportunity to network with fellow organizers and Google Developer Experts (GDEs) from around the world.  Attending were 68 community organizers, 50 GDEs and 9 open source contributors from a total of 37 countries.

I would have to say it was the most social conference I have ever attended. There were a lot of opportunities to meet with people from a wide range of backgrounds. I also got many valuable insights into how I could better run our meetup and better make use of Google products. In this post I’ll talk about what we got up to over the two days.

Thoughts on the ‘AWS Certified SysOps Administrator – Associate’ exam

A couple of weeks ago was a significant milestone in my 14-year IT career: I actually sat a certification exam. In this case, it was the AWS Certified SysOps Administrator – Associate Exam.

Despite some trepidation during my preparation for the exam, on the day I found it quite straightforward and came out with a pass mark. In this post I’m going to share some of my thoughts and notes in the hope that it will help others preparing to sit this exam.

Using Google Cloud AutoML to classify poisonous Australian spiders

Warning: This post contains pictures of spiders (and Spiderman)!

Google’s new Cloud AutoML Vision is a new machine learning service from Google Cloud that aims to make state of the art machine learning techniques accessible to non-machine learning experts. In this post I will show you how I was able, in just a few hours, to create a custom image classifier that is able to distinguish between different types of poisonous Australian spiders. I didn’t have any data when I started and it only required a very basic understanding of machine learning related concepts. I could probably show my Mum how to do it!

Getting ya music recommendation groove on with Google Cloud Platform! Part 3

In parts 1 and 2 of this blog series, we’ve seen how to implement an item-similarity model in TensorFlow, and the intuition behind various recommender models. It’s now time to have a high-level view of a recommendation project in the Google Cloud Platform. This will encompass all of our plumbing for the web service, so that it can be up and available on the web. I will outline two possible architectures – one where we deploy and manage TensorFlow ourselves using the Google Kubernetes Engine (GKE) , and the other using the fully-managed Cloud Machine Learning Engine (MLE).  You’ll also find how to communicate with the ML engine modules, and how to configure your computational clusters.

TEL Newsletter – February 2018

Shine’s TEL group was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company’s profile as a technical thought-leader in the community through blogs, local meet up talks, and conference presentations. Every now and then (it started off as being monthly, but that was too much work), we curate all the noteworthy things that Shiners have been up to, and publish a newsletter. Read on for this month’s edition.

Tips and tricks for building an Alexa skill

Recently, Energy Australia (one of Shine’s long standing clients) approached us to help them build an Alexa skill in time for the launch of the Amazon Echo into the Australia/New Zealand market.

The skill will allow Energy Australia customers to ask Alexa for information regarding their bills, and to get tips on how to minimise their energy usage.  In this blog post I’ll give an overview of our solution, and outline some of the tips and pitfalls we discovered during development.

Shine Solutions builds EnergyAustralia “skill” featured on Amazon Alexa Australian launch

January 25, 2018

Shine Solutions has built for EnergyAustralia one of the first Amazon Alexa “skills” in the Australian market

EnergyAustralia is among the first Australian-based organisations to feature on the highly-anticipated smart speaker due to arrive in Australia next month.

The skill has been designed to enable customers to gain easy access to their EnergyAustralia accounts and provide users with the ability to better manage their energy usage.

The cloud-powered service can perform a range of tasks in response to voice commands. Users will be able to ask the ever-efficient digital assistant such questions as:

“Alexa, ask EnergyAustralia how much is my latest bill?”
or
“Alexa, ask EnergyAustralia, when is my account due?”

“Alexa now provides EnergyAustralia’s customers with another way to easily engage with us” says Tony Robertshaw, EnergyAustralia’s Head of Digital and Incubation. “Our skill on Alexa will provide a more closely integrated customer experience and we are thrilled with the result. Shine has been instrumental in launching our first skill.

The Shine team delivered the outcomes we were seeking in an incredibly short timeframe and worked seamlessly with all the stakeholders involved. EnergyAustralia has worked closely with Shine for over 15 years now – they are a valued digital partner.”

Chatbot technology is certainly in its growth phase in Australia and the expected growth trajectory is significant says Shine Director Luke Alexander. “Voice-driven systems are becoming integral to our everyday lives. The potential for companies to forge closer relationships with customers through this technology is exciting – we are proud to partner with EnergyAustralia to help launch the organisation in this space.”

About Shine Solutions:

Shine has been at the forefront of developing enterprise software for 20 years. We are committed to working in partnership with our clients to devise and deliver digital solutions for their business needs. Since launching in 1998, Shine Solutions has forged long-term partnerships with some of Australia’s leading organisations including EnergyAustralia, Telstra, National Australia Bank and Coles.

Shine has offices in Melbourne and Sydney.

Contact:
Luke Alexander, Director
luke.alexander@shinesolutions.com

Trams, Shiners and Googlers!

Shine’s good friend Felipe Hoffa from Google was in Melbourne recently, and he took the time to catch up with our resident Google Developer Expert, Graham Polley. But, instead of just sitting down over a boring old coffee, they decided to take an iconic tram ride around the city. To make it even more interesting, they tested out some awesome Google Cloud technologies by using their phones to spin up a Cloud Dataflow cluster of 50 VMs, and process over 10 billion records of data in under 10 minutes! Check out the video they recorded:

Getting ya music recommendation groove on with Google Cloud Platform! Part 2

pexels-photo-258291.jpeg

In part 1, we learnt about recommendation engines in general, and looked at ways to implement a service using the Google Cloud Platform (GCP). In part 2 of the blog series, we are getting our hands dirty on the item-similarity model and TensorFlow implementation of it.

This is our first technical blog of the series. Here, I deep dive into the data processing step, the recommendation service, and some hints on how to optimise the code to have real-time responses. You should expect to know how to build a simple item-similarity recommender engine by the end of this blog.

So let’s get the party started!