Using Google Cloud AutoML to classify poisonous Australian spiders

Warning: This post contains pictures of spiders (and Spiderman)!


Google’s new Cloud AutoML Vision is a new machine learning service from Google Cloud that aims to make state of the art machine learning techniques accessible to non-machine learning experts. In this post I will show you how I was able, in just a few hours, to create a custom image classifier that is able to distinguish between different types of poisonous Australian spiders. I didn’t have any data when I started and it only required a very basic understanding of machine learning related concepts. I could probably show my Mum how to do it!

Trams, Shiners and Googlers!

Shine’s good friend Felipe Hoffa from Google was in Melbourne recently, and he took the time to catch up with our resident Google Developer Expert, Graham Polley. But, instead of just sitting down over a boring old coffee, they decided to take an iconic tram ride around the city. To make it even more interesting, they tested out some awesome Google Cloud technologies by using their phones to spin up a Cloud Dataflow cluster of 50 VMs, and process over 10 billion records of data in under 10 minutes! Check out the video they recorded:

Scheduling BigQuery jobs: this time using Cloud Storage & Cloud Functions


Post update: My good friend Lak over at Google has come up with a fifth option! He suggests using Cloud Dataprep to achieve the same. You can read his blog post about that over here. I had thought about using Dataprep, but because it actually spins up a Dataflow job under-the-hood, I decided to omit it from my list. That’s because it will take a lot longer to run (the cluster needs to spin up and it issues export and import commands to BigQuery), rather than issuing a query job directly to the BigQuery API. Also, there are extra costs involved with this approach (the query itself, the Dataflow job, and a Dataprep surcharge – ouch!). But, as Lak pointed out, this would be a good solution if you want to transform your data, instead of issuing a pure SQL request. However, I’d argue that can be done directly in SQL too 😉

Not so long ago, I wrote a blog post about how you can use Google Apps Script to schedule BigQuery jobs. You can find that post right here. Go have a read of it now. I promise you’ll enjoy it. The post got quite a bit of attention, and I was actually surprised that people actually take the time out to read my drivel.

It’s clear that BigQuery’s popularity is growing fast. I’m seeing more content popping up in my feeds than ever before (mostly from me because that’s all I really blog about). However, as awesome as BigQuery is, one glaring gap in its arsenal of weapons is the lack of a built-in job scheduler, or an easy way to do it outside of BigQuery.

That said however, I’m pretty sure that the boffins over in Googley-woogley-world are currently working on remedying that – by either adding schedulers to Cloud Functions, or by baking something directly into the BigQuery API itself. Or maybe both? Who knows!

Getting ya music recommendation groove on with Google Cloud Platform!


Recommendation systems are found under the hood of many popular services and websites. The e-commerce and retail industry use them to increase their sales, the music services provide interesting songs to their listeners, and the news sites rank the daily articles based on their readers interests. If you really think about it, recommendation systems can be used in pretty much every area of daily life. For example, why not automatically recommend better choices to house investors, guide your friends in your hometown without you being around, or suggest which company to apply to if you are looking for a job.

All pretty cool stuff, right!

But, recommendation systems need to be a lot smarter than a plain old vanilla software. In fact, the engine is made up of multiple machine learning modules that aim to rank the items of the interests for the users based on the users preferences and items properties.

In this blog series, you will gain some insight on how recommendation systems work, how you can harness Google Cloud Platform for scalable systems, and the architecture we used when implementing our music recommendation engine on the cloud. This first post will be a light introduction to the overall system, and my follow up articles will subsequently deep dive into each of the machine learning modules, and the tech that powers them.

Scheduling BigQuery jobs using Google Apps Script

Do you recoil in horror at the thought of running yet another mundane SQL script just so a table is automatically rebuilt for you each day in BigQuery? Can you barely remember your name first thing in the morning, let alone remember to click “Run Query” so that your boss gets the latest data refreshed in his fancy Data Studio charts, and then takes all the credit for your hard work?

Well, fear not my fellow BigQuery’ians. There’s a solution to this madness.

It’s simple.

It’s quick.

Yes, it’s Google Apps Script to the rescue.

Disclaimer: all credit for this goes to the one and only Felipe Hoffa. He ‘da man!

To Partition or not to Partition

I have been using BigQuery for over 2 years now at Shine. I’ve found it to be a great tool that is both incredibly fast and able to handle some of our largest workloads. We are processing terabytes of data per day, and each day an extra billion records are added to the store.

But unfortunately this growth is also increasing our costs of running queries. While BigQuery is extremely fast and parallel, it comes at the cost of needing to scan and pay for every record of the columns you are querying. Without the indexes offered by conventional databases, a full table scan is needed for each query. Not only that but when you query large amounts of data the speed of your query slows down:In this post I’ll talk about how we used table partitions to increase the performance of our queries and avoid query slowdowns.

BigQuery & new users – the top “WTF!?” moments

“What the Fudge?”

I use Google BigQuery a lot. On a daily basis I run dozens of queries, use it to build massively scalable data pipelines for our clients, and regularly help new users navigating it for the first time. Suffice it to say I’m somewhat accustomed to its little quirks. Unfortunately, the same can’t be said for the new users who are commonly left scratching their heads, and shouting “What the fudge!?” at their monitors.

Here’s the top three WTFs that I regularly hear from new BigQuery users:

Beam me up Google – porting your Dataflow applications to 2.x

Will this post interest me?

If you use (or intend to use) Google Cloud Dataflow, you’ve heard about Apache Beam, or if you’re simply bored in work today and looking to waste some time, then yes, please do read on. This short post will cover why our team finally took the plunge to start porting some of Dataflow applications (using the 1.x Java SDKs) to the new Apache Beam model (2.x Java SDK). Spoiler – it has something to do with this. It will also highlight the biggest changes we needed to make when making the switch (pretty much just fix some compile errors).

Whispers from the other side of the globe with BigQuery

Setting the scene

A couple of months ago my colleague Graham Polley wrote about how we got started analysing 8+ years worth of of WSPR (pronounced ‘whisper’) data. What is WSPR? WSPR, or Weak Signal Propagation Reporter, is signal reporting network setup by radio amateurs for monitoring the ability for radio signals to get from one place to another. Why would I care? I’m a geek and I like data. More specifically the things it can tell us about seemingly complex processes. I’m also a radio amateur, and enjoy the technical aspects of  communicating around the globe with equipment I’ve built myself.

Homer simpson at Radio transceiver
Homer Simpson as a radio Amateur

Gobbling up big-ish data for lunch using BigQuery

Beers + ‘WSPR’ = fun

To this day, I’m a firm believer in the benefits of simple, informative, and spontaneous conversations with my colleagues – at least with the ones who can stand me long enough to chat with me . Chewing the fat with other like minded folks over a beer or two is a bloody good thing. It’s how ideas are born, knowledge is shared, and relationships are formed. It’s an important aspect of any business that is sadly all too often overlooked.