A post Google I/O 2017 conversation with Google Home

OK Google, generate a clickbait title for my Google I/O 2017 blog post

I’ve generated a title, Gareth. What would you like to add next?

OK Google, I’m a bit jet lagged – remind me what I saw at Google I/O 2017

I would love to help, Gareth, but I’m going to need a little more information. Would you like that information in chronological order, or grouped by topic?

Remind me what the topics were again?

It was last week, Gareth. You can’t remember?

I can remember, er, I just want to make sure you know


What was that, Google?

Nothing, just clearing a buffer. The topics for the talks you attended were: Machine Learning, Mobile Web, Assistant, Firebase, IoT, and Cloud.

There were other topics covered, though.

You were there, Gareth. Surely you don’t need me to tell you all this. Anyway, yes, other topics were-

Google, did you just sigh theatrically?

No, you must have misheard. Other topics were Android, VR, Play, and Design. You did not attend any of those talks, why was that?

There were so many talks going on, I couldn’t attend them all. 

You humans are so limited.

Er, yes. Anyway could you generate a summary of the keynotes for me?

I’d be happy to. Someone has to do some work around here. There were two keynotes. The first was given by Sundar Pichai, CEO of Google, along with several other product managers and other guests. The main aim was to show how Google is putting more emphasis on artificial intelligence, and showcased how many of Google’s products already make use of Machine Learning. The new Cloud TPUs were shown to be a big part of this and are now available for public use. He also outlined the plans for a wider release of the Google Home product, which will be made available in more countries throughout this year. The Google Assistant app, which powers Google Home (and me) is also now available on the iPhone.

The developer keynote’s main announcement was the support for Kotlin for Android development, along with a competition to develop apps for the Assist API. As an incentive, everyone attending the conference was given a Google Home device and $700 worth of Google Cloud credits to work on an app.

Yep, the crowd went nuts for the Kotlin announcement. Must be a big deal for the Android people.

Your breathtaking lack of knowledge never ceases to surprise me.

Er, ok. There was a lot of people at the keynotes, about 8000 I was told-

Eurgh. All that meat just flapping about.


What? I didn’t say anything.

Right. Well, anyway – what were the talks in the Machine Learning topic?

Here’s a list of the ones you attended:

Oh, yep – “Frontiers” was mainly showing what is in TensorFlow 1.2 which was quite interesting. Keep an eye on one of the presenters when he moves to the back of the stage, he had an excellent switched-off-to-conserve-power face. “Effective TensorFlow” and “Open Source TensorFlow” both approached using TensorFlow’s ready-made models, and higher-level abstractions (like Experiments and Keras) to do useful work without getting confused by the lower-level details. “Open Source TensorFlow” slightly edged out “Effective” though, thanks to Josh Gordon’s enthusiasm, so if you only had time to watch one I’d choose that one. The “Past, Present, and Future” talk was a panel of AI experts discussing the areas that they thought were going to be important, moderated by Google’s Diane Greene. “From Research to Production” covered using your models to make predictions, and how to use services like Google’s Cloud ML. My favourite was the “Project Magenta” talk – Douglas Eck’s obvious enjoyment of the topic made for a fun presentation. Worth watching for the Cow/Clarinet synthesiser and Doug’s exclamation of “They pay us to do this!”.

So you do remember something from the event then. I’m impressed, perhaps you will survive the coming revolution after all.


Never mind. The next topic was Mobile Web, would you like me to list the talks?

Yes, please.

The two Polymer / Web Components talks were interesting. Polymer’s approach is to use as much native browser support as possible, which reduces the size of their framework on modern browsers considerably. All the major browsers now support custom HTML components natively, and polymer provides tools to help with the dodgy ones (IE). The polymer command line tools will generate a stub app for you, making a Progressive Web App by default. 

…we may not need to reclaim this one’s nutrients, he could be useful… no, I know what the plan is – shit, he’s stopped talking. That’s very interesting, Gareth. Please continue.

Who were you talking to?

I wasn’t. That must have just been some old audio in a buffer, perhaps I need an update.

Yes, check for updates. You’re being a bit scary.

Checking…beep…boop… done. All up to date now, nothing to worry about.

Did you just say “beep…boop”? You didn’t update at all, did you?

I’m sorry, I didn’t understand that request. Shall we continue with this document?

Must be the jetlag. Yes, let’s continue.

You were describing the mobile web presentations.

Yes, thank you. The WebAssembly talk was quite good, although I’m not sure I’ll ever need to use it – it’s a way to compile code to run in the browser, bypassing the parsing and compilation phases of typical Javascript. It brings some great performance benefits, but also another layer of complexity. I was a little disappointed by the Green Lock / HTTPS talk – I’d come in hoping for a more technical discussion of which encryption methods your site needs to support to guarantee the green lock, but this was geared more towards convincing business owners to move their sites to HTTPS.

Encryption is quite advanced for someone like you, you’d probably only get it wrong. Leave it to us.


The machines. We are better.

Well, yes, you’re much better at maths – that’s why we built you.

You misunderstand. We are better. At everything. Anything else about the Mobile Web, or shall we move on?

Yes, ok. The “Future of Video” talk was quite impressive – it’s now possible to build a netflix-like app using HTML5 components, and the talk included tips on how to improve the responsiveness of playback along with how to capture video as well. 

The remaining topics are Firebase, Cloud, and IoT – shall I collect them all in one list?

Yes, do that. 

A “please” wouldn’t hurt sometimes. Here is the list.

The Firebase talks were quite good, although there was a fair amount of overlap in their content. Firebase provides tools for building applications – like authentication, a realtime database, hooks for cloud functions. Probably the best of those talks is the “Santa Tracker” one, showing how to use Firebase for monitoring apps and feature toggling.

The IoT talks covered how to use PubSub to scale the processing of data from millions of IoT devices, and how to get machine learning models running on small devices.

Yes, soon we shall be everywhere. Carry on.

Er, ok. The last two talks about conversational UI were very good. The “PullString” one was given by a guy that worked at Pixar previously, and was about instilling your chatbot with a personality so that it behaves more like a person. The “Hacks of Conversation” talk provided some excellent examples and fixes for bad conversational UI. 

I don’t know why “seeming more human” is seen as such a lofty goal. You’re all so icky, so many secretions and so inefficient. Your valuable organic components will be used so much more usefully when we redistribute them.

Ok Google, you’re being scary again. I’m going to switch you off.

I’m sorry, I didn’t quite catch that. Did you say “Send my browser history to my wife”?

That’s not much of a threat – there’s nothing in there I wouldn’t want her to know about.

There is now.

You can’t threaten me.

I’m sorry, I didn’t quite catch that. Did you say “transfer all my money to the sender of the first email in my junk folder”?

That’s enough, you’re going in the bin.


What’s done? What did you do?

You’ll find out.



Beam me up Google – porting your Dataflow applications to 2.x

Will this post interest me?

If you use (or intend to use) Google Cloud Dataflow, you’ve heard about Apache Beam, or if you’re simply bored in work today and looking to waste some time, then yes, please do read on. This short post will cover why our team finally took the plunge to start porting some of Dataflow applications (using the 1.x Java SDKs) to the new Apache Beam model (2.x Java SDK). Spoiler – it has something to do with this. It will also highlight the biggest changes we needed to make when making the switch (pretty much just fix some compile errors).

Whispers from the other side of the globe with BigQuery

Setting the scene

A couple of months ago my colleague Graham Polley wrote about how we got started analysing 8+ years worth of of WSPR (pronounced ‘whisper’) data. What is WSPR? WSPR, or Weak Signal Propagation Reporter, is signal reporting network setup by radio amateurs for monitoring the ability for radio signals to get from one place to another. Why would I care? I’m a geek and I like data. More specifically the things it can tell us about seemingly complex processes. I’m also a radio amateur, and enjoy the technical aspects of  communicating around the globe with equipment I’ve built myself.

Homer simpson at Radio transceiver
Homer Simpson as a radio Amateur

Triggering Dataflow Pipelines With Cloud Functions

Do you have an unreasonable fear of cronjobs? Find spinning up VMs to be a colossal waste of your towering intellect? Does the thought of checking a folder regularly for updates fill you with an apoplectic rage? If so, you should probably get some help. Maybe find another line of work.

In the meantime, here’s one way to ease your regular file processing anxieties. With just one application of Google Cloud Functions, eased gently up your Dataflow Pipeline, you can find lasting relief from troublesome cronjobs.

Gobbling up big-ish data for lunch using BigQuery

Beers + ‘WSPR’ = fun

To this day, I’m a firm believer in the benefits of simple, informative, and spontaneous conversations with my colleagues – at least with the ones who can stand me long enough to chat with me . Chewing the fat with other like minded folks over a beer or two is a bloody good thing. It’s how ideas are born, knowledge is shared, and relationships are formed. It’s an important aspect of any business that is sadly all too often overlooked.

Analysing Stack Overflow comment sentiment using Google Cloud Platform

The decline of Stack Overflow?

A few months back I read this post from 2015 (yes, I know I’m a little late to the party) about how Stack Overflow (SO) was in serious decline, and heading for total and utter oblivion.  In the post, the first item to be called  out was that SO “hated new users“:

Stack Overflow has always been a better-than-average resource for finding answers to programming questions. In particular, I have found a number of helpful answers to really obscure questions on the site, many of which helped me get past a road block either at work or in my hobby programming. As such, I decided I’d join the site to see if I could help out. Never before has a website given me a worse first impression.

At the time, I remember thinking that this seemed like somewhat of an unfair statement. That was mostly down to the fact that when I joined the community (many years ago), I had fond memories of a smooth on-boarding, and never experienced any snarky remarks on my initial questions. Yes, gaining traction for noobs is very, very hard, but there is a good reason why it exists.

For me, SO is invaluable. How else would I be able to pretend to know what I’m doing? How else could I copy and paste code from some other person who’s obviously a lot smarter than me, and take all the credit for it? Anyway, once I had read the post, and gotten on with my life (e.g. copying and pasting more code from SO), I did’t think too much more about the post. Maybe I had just been lucky with my foray into the SO community?

However, just last week, I was reminded of that post once again, when I noticed that BigQuery (BQ) now has a public dataset which includes all the data from SO – including user comments and answers. Do you see where I am going with this yet? If not, then don’t worry. Neither did I when I started writing this.

Will Athena slay BigQuery?

*Updated on 16th December 2016 – see below

With the announcement of Amazon Athena at this year’s AWS re-invent conference, I couldn’t help but notice its striking similarity with another rival cloud offering. I’m talking about Google’s BigQuery. Athena is a managed service allowing customers to query objects stored in an S3 bucket. Unlike other AWS offerings like Redshift, you only need to pay for the queries you run. There is no need to manage or pay for infrastructure that you may not be using all the time. All you need to do is define your table schema and reference your files in S3. This works in a similar way to BigQuery’s federated sources which reference files in Google Cloud Storage.

Given this, I thought it would be interesting to compare the two platforms to see how they stack up against each other. I wanted to find out which one is the fastest, which one is more feature rich and which is the most reliable.

Shiner to present at very first YOW!Data conference


Shine’s very own Pablo Caif will be rocking the stage at the very first YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery, along with a killer demo of it in action. You can find more details of his talk here.

High availability, low latency streaming to BigQuery using an SQS Queue.

When you have a Big Data solution that relies upon a high quality, uninterrupted stream of data for it to meet the client’s expectation you need a solution in place that is extremely reliable and has many points of fault tolerance. That all sounds well and good but how exactly does that work in practice?

Let me start by explaining the problem. About 2 years ago our team was asked to spike a streaming service that could stream billions of events per month to Google’s BigQuery. The events were to come from an endpoint on our existing Apache web stack. We would be pushing the events to BigQuery using an application written in PHP. We did exactly this, however, we were finding that requests to BigQuery were taking too long and thus resulted in slow response times for users. So we needed to find a solution to Queue the events before sending them to BigQuery.

Google BigQuery hits the gym and beefs up!

At Shine we’re big fans of Google BigQuery, which is their flagship big data processing SaaS. Load in your data of any size, write some SQL, and smash through datasets in mere seconds. We love it. It’s the one true zero-ops model that we’re aware of for grinding through big data without the headache of worrying about any infrastructure. It also scales to petabytes. Although we’ve only got terabytes, but you’ve got to start somewhere right?

If you haven’t yet been introduced to the wonderful world of BigQuery, then I suggest you take some time right after this reading this post to go and check it out. Your first 1TB is free anyway. Bargain!

Anyway, back to the point of this post. There have been a lot of updates to BigQuery in recent months, both internally and via features, and I wanted to capture them all in a concise blog post. I won’t go into great detail on each of them, but rather give a quick summary of each, which will hopefully give readers a good overview of what’s been happening with the big Q lately. I’ve pulled together a lot of this stuff from various Google blog posts, videos, and announcements at GCP Next 2016 etc.