Scheduling BigQuery jobs: this time using Cloud Storage & Cloud Functions


Post update: My good friend Lak over at Google has come up with a fifth option! He suggests using Cloud Dataprep to achieve the same. You can read his blog post about that over here. I had thought about using Dataprep, but because it actually spins up a Dataflow job under-the-hood, I decided to omit it from my list. That’s because it will take a lot longer to run (the cluster needs to spin up and it issues export and import commands to BigQuery), rather than issuing a query job directly to the BigQuery API. Also, there are extra costs involved with this approach (the query itself, the Dataflow job, and a Dataprep surcharge – ouch!). But, as Lak pointed out, this would be a good solution if you want to transform your data, instead of issuing a pure SQL request. However, I’d argue that can be done directly in SQL too ūüėČ

Not so long ago, I wrote a blog post about how you can use Google Apps Script to schedule BigQuery jobs. You can find that post right here. Go have a read of it now. I promise you’ll enjoy it. The post got quite a bit of attention, and I was actually surprised that people actually take the time out to read my drivel.

It’s clear that BigQuery’s popularity is growing fast. I’m seeing more content popping up in my feeds than ever before (mostly from me because that’s all I really blog about). However, as awesome as BigQuery is, one glaring gap in its arsenal of weapons is the lack of a built-in job scheduler, or an easy way to do it outside of BigQuery.

That said however, I’m pretty sure that the boffins over in Googley-woogley-world are currently working on remedying that – by either adding schedulers to Cloud Functions, or by baking something directly into the BigQuery API itself. Or maybe both? Who knows!

My favourite talks from YOW! 2017 Melbourne

No food reviews here I’m afraid

This year I was incredibly lucky to score a¬†coveted ticket to YOW! in beautiful Melbourne. I was also asked to be a track host for a couple of sessions, so that was quite an honour too. This post is a whirlwind wrap-up of the conference, and only includes my favourite talks from the two day event. If you’re hoping to hear detailed reviews on how the coffee/food/WiFi/venue was, then you’ll be greatly disappointed (it was all great BTW).

Scheduling BigQuery jobs using Google Apps Script

Do you recoil in horror at the thought of running yet another mundane SQL script just so a table is automatically rebuilt for you each day in BigQuery? Can you barely remember your name first thing in the morning, let alone remember to click “Run Query” so that your boss gets the latest data refreshed in his fancy Data Studio charts, and then takes all the credit for your hard work?

Well, fear not my fellow BigQuery’ians. There’s a solution to this madness.

It’s simple.

It’s quick.

Yes, it’s Google Apps Script to the rescue.

Disclaimer: all credit for this goes to the one and only Felipe Hoffa. He ‘da man!

BigQuery & new users – the top “WTF!?” moments

“What the Fudge?”

I use Google BigQuery a lot. On a daily basis I run dozens of queries, use it to build massively scalable data pipelines for our clients, and regularly help new users navigating it for the first time. Suffice it to say I’m somewhat accustomed to its little quirks. Unfortunately, the same can’t be said for the new users who are commonly left scratching their heads, and shouting “What the fudge!?” at their monitors.

Here’s the top three WTFs that I regularly hear from new BigQuery users:

Beam me up Google – porting your Dataflow applications to 2.x

Will this post interest me?

If you use (or intend¬†to use) Google Cloud Dataflow, you’ve heard about Apache Beam, or if you’re simply¬†bored in work today and looking to waste some time, then yes, please do read on. This short post will cover why¬†our team finally took the plunge to start porting some of Dataflow applications (using the 1.x Java SDKs) to the new Apache Beam model (2.x Java SDK).¬†Spoiler – it has something to do with this. It will also highlight the biggest changes¬†we needed to make when making the switch (pretty much just fix some compile errors).

Gobbling up big-ish data for lunch using BigQuery

Beers + ‘WSPR’ = fun

To this day, I’m a firm believer in the benefits¬†of simple, informative, and spontaneous conversations with my colleagues¬†– at least with the ones who can stand me long enough to chat with me . Chewing the fat with other like minded folks over a beer or two is a bloody good¬†thing. It’s how¬†ideas are born, knowledge is shared, and relationships are formed. It’s an important aspect of any business that is sadly all too often overlooked.

Analysing Stack Overflow comment sentiment using Google Cloud Platform

The decline of Stack Overflow?

A few months back I read this post from 2015 (yes, I know I’m a little late to the party) about how Stack Overflow (SO) was in serious decline, and heading for total and utter oblivion. ¬†In the post, the first item¬†to be called ¬†out was that SO “hated new users“:

Stack Overflow has always been a better-than-average resource for finding answers to programming questions. In particular, I have found a number of helpful answers to really obscure questions on the site, many of which helped me get past a road block either at work or in my hobby programming. As such, I decided I’d join the site to see if I could help out. Never before has a website given me a worse first impression.

At the time, I remember thinking that this seemed like somewhat of an unfair statement. That was mostly down to the fact that when I joined the community (many years ago), I had fond memories of a smooth on-boarding, and never experienced any snarky remarks on my initial questions. Yes, gaining traction for noobs is very, very hard, but there is a good reason why it exists.

For me, SO is invaluable. How else would I be able to pretend to know what I’m doing? How else could I copy and paste code from some other person who’s obviously a lot smarter than me, and take all the credit for¬†it? Anyway, once I had read the post, and gotten on with my life (e.g. copying and pasting more code from SO), I did’t think too much more about the post. Maybe I had¬†just been lucky with my foray¬†into the SO community?

However, just last week, I was reminded of that post once again, when I noticed that BigQuery (BQ) now has a public dataset which includes¬†all the data from SO – including user comments and answers. Do you see where I am going with this yet? If not, then don’t worry. Neither did I when I started writing this.

Shiner to present at very first YOW!Data conference


Shine’s very own Pablo Caif will be rocking the stage at the very first¬†YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery, along with a killer demo of it in action.¬†You can find more details of his talk here.

Google BigQuery hits the gym and beefs up!

At Shine we’re big¬†fans of Google BigQuery, which is their flagship big data processing SaaS. Load in your data of any size, write some SQL, and smash through datasets in mere seconds. We love it. It’s the one true zero-ops model that we’re aware of for grinding¬†through big data¬†without the headache of worrying about any infrastructure. It also¬†scales to petabytes. Although we’ve only got terabytes, but you’ve got to start¬†somewhere right?

If you haven’t yet been introduced to the wonderful world of BigQuery, then I suggest you take some time right after this reading this post¬†to go and check it out. Your first 1TB is free anyway. Bargain!

Anyway, back to the point of this post. There have been a lot of updates to BigQuery in recent months, both internally and via features, and I wanted to capture them all in a¬†concise blog post. I won’t go into great detail on¬†each of¬†them, but rather give a quick summary of each, which will hopefully give readers a good overview of what’s been happening with the big Q lately. I’ve pulled together a lot of this stuff from various Google blog posts, videos, and announcements at GCP Next 2016 etc.

Shiner to present at YOW! Connected 2016 – Mobile & IOT


Shine’s Gareth Jones has been accepted to give a talk at¬†YOW! Connected 2016 – Mobile & Internet of Things! His talk, titled ”Progressive Web Apps: What Has The Web Ever Done For Us?“, will take a look at¬†what some believe to be the future of mobile development.

YOW! Connected 2016 will be on in Melbourne from the 5th-6th October.¬†You can catch more details¬†of Gareth’s talk (and his awesome bio!) over¬†here.